Local Scale (3-m) Soil Moisture Mapping Using SMAP and Planet SuperDove

: A capability for mapping meter-level resolution soil moisture with frequent temporal sampling over large regions is essential for quantifying local-scale environmental heterogeneity and eco-hydrologic behavior. However, available surface soil moisture (SSM) products generally involve much coarser grain sizes ranging from 30 m to several 10 s of kilometers. Hence, a new method is proposed to estimate 3-m resolution SSM using a combination of multi-sensor fusion, machine-learning (ML), and Cumulative Distribution Function (CDF) matching approaches. This method established favorable SSM correspondence between 3-m pixels and overlying 9-km grid cells from overlapping Planet SuperDove (PSD) observations and NASA Soil Moisture Active-Passive (SMAP) mission products. The resulting 3-m SSM predictions showed improved accuracy by reducing absolute bias and RMSE by ~0.01 cm 3 /cm 3 over the original SMAP data in relation to in situ soil moisture measurements for the Australian Yanco region while preserving the high sampling frequency (1–3 day global revisit) and sensitivity to surface wetness (R 0.865) from SMAP. Heterogeneous soil moisture distributions varying with vegetation biomass gradients and irrigation regimes were generally captured within a selected study area. Further algorithm reﬁnement and implementation for regional applications will allow for improvement in water resources management, precision agriculture, and disaster forecasts and responses.


Introduction
Surface soil moisture (SSM) exerts a fundamental control on land surface hydrological and ecological processes [1,2] and serves as a key environmental input for a variety of scientific studies and applications such as flood and drought monitoring [3,4], wildfire risk assessment [5], and crop yield forecasts [6].
SSM strongly influences soil thermal and dielectric properties, surface reflectance, and vegetation physiology [7]. Both optical-infrared (IR) and microwave remote sensing techniques provide practical approaches for quantifying the spatial distribution and temporal changes of regional SSM through measuring the electromagnetic signatures of the land surface. Optical-IR sensors are well suited for indirectly inferring surface and root zone soil moisture by monitoring the changes of surface thermal properties (e.g., soil temperature, thermal inertia in the case of bare soil) and surface reflectance properties sensitive to vegetation cover and growth, although, the observations may be degraded by cloud cover, atmosphere aerosols, and sub-optimal illumination conditions [7]. Microwave remote sensing provides direct measurements of soil dielectric properties, which are highly sensitive to soil moisture changes [8]. Satellite active and passive microwave sensors are also capable of day-and-night and nearly all-weather observations of Earth's surface. The Our study region is a 63 km × 63 km area located in the Yanco area of southern New South Wales, Australia, consisting of 7 by 7 SMAP 9-km grid cells in a global EASE-GRID v2 projection format (Figure 1). The region is characterized by a semi-arid climate, dominated by cropland and grassland, and has been intensely studied in previous field campaigns [19,34,35]. The 9-km grid cells (gray lines in Figure 1) were used to match SMAP SSM and Planet SuperDove (PSD) observations for training a machine learning model. In situ soil moisture stations within the Yanco network are used as a Core Validation Site for assessing SMAP SSM products [14] and were adopted in this study for algorithm evaluation. A focused study area (3 km × 6 km blue rectangle in Figure 1) was selected for examining local-scale SSM distribution patterns, where intensive soil moisture sampling was conducted in March 2021.
Remote Sens. 2022, 14,3812 sensors and exploited ML and Cumulative Distribution Function (CDF) matchi proaches to derive daily and local-scale (3-m) SSM. The work is potentially useful fo ies and applications needing improved quantification of land surface heterogeneit

Study Region
Our study region is a 63 km × 63 km area located in the Yanco area of souther South Wales, Australia, consisting of 7 by 7 SMAP 9-km grid cells in a global EASE v2 projection format (Figure 1). The region is characterized by a semi-arid climate, nated by cropland and grassland, and has been intensely studied in previous fiel paigns [19,34,35]. The 9-km grid cells (gray lines in Figure 1) were used to match SSM and Planet SuperDove (PSD) observations for training a machine learning mo situ soil moisture stations within the Yanco network are used as a Core Validation assessing SMAP SSM products [14] and were adopted in this study for algorithm e tion. A focused study area (3 km × 6 km blue rectangle in Figure 1) was selected for ining local-scale SSM distribution patterns, where intensive soil moisture samplin conducted in March 2021. Figure 1. Location of the study region (red rectangle) consisting of 7 by 7 SMAP 9-km gr (gray lines) used for preparing spatially and temporally matched data sets for machine le Yanco soil moisture network (stations marked by red circles with names labeled) used for as the accuracy of downscaled SSM, and a focused study area (dark blue dash lines) with in soil moisture sampling for examining local-scale SSM distribution patterns.

Data Sets
Four major data sets were used for generating and assessing the 3-m SSM r including (a) long-term in situ soil moisture measurements from the Yanco netwo intensive ground measurements using the Hydraprobe Data Acquisition System (H [36] over the focused study area, (c) PSD 8-band imagery, and (d) SMAP Enhan Radiometer Global daily 9-km soil moisture (SPL3SMP_E; version 5). In addition, tion and terrain slope data were obtained from a 5-m Digital Elevation Model (D Australia [37] to account for terrain effects on soil moisture patterns. Our study f on an 11-month period from January to December 2021, coinciding with the limited ability of in situ SSM and PSD imagery used for model validation. Location of the study region (red rectangle) consisting of 7 by 7 SMAP 9-km grid cells (gray lines) used for preparing spatially and temporally matched data sets for machine learning, Yanco soil moisture network (stations marked by red circles with names labeled) used for assessing the accuracy of downscaled SSM, and a focused study area (dark blue dash lines) with intensive soil moisture sampling for examining local-scale SSM distribution patterns.

Data Sets
Four major data sets were used for generating and assessing the 3-m SSM results, including (a) long-term in situ soil moisture measurements from the Yanco network, (b) intensive ground measurements using the Hydraprobe Data Acquisition System (HDAS) [36] over the focused study area, (c) PSD 8-band imagery, and (d) SMAP Enhanced L3 Radiometer Global daily 9-km soil moisture (SPL3SMP_E; version 5). In addition, elevation and terrain slope data were obtained from a 5-m Digital Elevation Model (DEM) of Australia [37] to account for terrain effects on soil moisture patterns. Our study focused on an 11-month period from January to December 2021, coinciding with the limited availability of in situ SSM and PSD imagery used for model validation.

Yanco Network
The Yanco soil moisture network is part of the larger Murrumbidgee Soil Moisture Monitoring Network (MSMMN), providing long-term and spatially-distributed SSM measurements [38]. The MSMMN measures surface soil moisture (0-5 cm or 0-8 cm depth) every 20 min, along with soil temperature and rainfall parameters [38]. The 0-5 cm SSM measurement record from January to December 2021 was used for this study and downloaded from the OZNET website (http://www.oznet.org.au/, accessed on 15 January 2022). Among all Yanco stations, only 14 sites (Figure 1) possessing at least 35 data pairs formed by SMAP SSM and clear-sky PSD observations during the study period were selected for evaluating the proposed downscaling approaches.

Intensive Sampling Using HDAS
Intensive sampling of 0-5 cm depth SSM was made using the HDAS at 50 m spacing over the focused study area for a three-week period  March 2021) during the P-band Radiometer Inferred Soil Moisture 2021 (PRISM-21) campaign (https://www.prism. monash.edu, accessed on 15 January 2022) [39]. Ancillary vegetation and irrigation information was also collected along with soil moisture by the HDAS operators to assist the interpretation of field plot-to-plot soil moisture changes. The intensive sampling data were used to analyze the spatial patterns of the downscaled 3-m SSM results.

PSD 8-Band Imagery
The Planet constellation is currently composed of three generations of satellites, including Dove Classic, Dove-R, and SuperDove, with Equator crossing times between 9:30 and 11:30 [40]. Improved from the first two sensor generations, which capture fourband imagery, SuperDoves allow for eight-band imaging (Coastal Blue, Blue, Green, Green II, Yellow, Red, Red-Edge, and NIR bands) with enhanced image sharpness and quality [40]. Following launches from 2019 to 2022, there are more than 100 SuperDoves on-orbit, which have delivered daily global observations since 2021. In this study, we selected SuperDove imagery obtained with cloud coverage of less than 5% over the study region and period.

SMAP Soil Moisture
The SMAP L-band radiometer has provided twice daily (~6:00 p.m. and a.m. local time) microwave brightness temperature (Tb) observations over the globe since 2015. The SPL3SMP_E was derived from the Tb data spatially interpolated to a finer 9-km global EASE-GRID (v2) resolution using the Backus-Gilbert (BG) technique [41]. The SPL3SMP_E data files record SSM estimates derived using three algorithms, including Single Channel Algorithm (SCA) using V-polarized Tb, SCA using H-polarized Tb, and Dual Channel Algorithm (DCA) [13,14]. As the current baseline, the DCA has shown the best performance among all three SSM retrieval algorithms in comparison with in situ measurements across the globe [42]. The DCA SSM estimates derived from descending (morning pass) SMAP observations were used in this study to pair with PSD morning acquisitions for training an ML model and providing baseline surface wetness information for further downscaling activities. The SMAP data were downloaded through the NSIDC website (https://nsidc.org/data/SPL3SMP_E/versions/5, accessed on 15 January 2022).

Data Processing Using GEE
The GEE is a web-based platform capable of efficient archiving, processing, visualizing, and analysis of multi-petabyte and multi-source data [33]. The high-performance cloud computation capabilities of GEE are suitable for the integrated analysis of large geospatial data sets for environmental applications. For our study, we relied on the GEE Application Programming Interface (API) available in the Google Colab Notebook environment for matching the multi-source data spatially and temporally. Specifically, the SMAP HDFformat data were first converted to GeoTIFF format and then uploaded to GEE as an image collection. Similarly, the PSD images were uploaded to GEE as another image collection. For linking SMAP SSM and PSD spectral observations, SuperDove 3-m pixels were aggregated for each SMAP 9-km grid cell ( Figure 1) using GEE by averaging all pixels within the associated grid cell and then matched with SMAP SSM based on date and location. The SMAP and PSD data pairs were then used for training an ML model in the Colab Notebook environment.

Approach Overview
This approach ( Figure 2) involves a combination of multi-sensor data fusion, ML, and CDF matching for generating daily SSM maps at 3-m spatial resolution. For the ML step, the relationships between predictor variables and the SSM target variable were first derived under SMAP 9-km grid cells. The trained model was then applied to 3-m pixels when both SMAP retrievals and overlapping clear-sky PSD observations were available. Thirty-nine predictor variables were used in this study, including reflectances from all eight PSD spectral bands, normalized reflectance differences calculated from PSD band pairs, terrain information, and the number of each ten-day period in the calendar year (N10DOY) ( Table 1).
SMAP HDF-format data were first converted to GeoTIFF format and then uploaded to GEE as an image collection. Similarly, the PSD images were uploaded to GEE as another image collection. For linking SMAP SSM and PSD spectral observations, SuperDove 3-m pixels were aggregated for each SMAP 9-km grid cell ( Figure 1) using GEE by averaging all pixels within the associated grid cell and then matched with SMAP SSM based on date and location. The SMAP and PSD data pairs were then used for training an ML model in the Colab Notebook environment.

Approach Overview
This approach ( Figure 2) involves a combination of multi-sensor data fusion, ML, and CDF matching for generating daily SSM maps at 3-m spatial resolution. For the ML step, the relationships between predictor variables and the SSM target variable were first derived under SMAP 9-km grid cells. The trained model was then applied to 3-m pixels when both SMAP retrievals and overlapping clear-sky PSD observations were available. Thirty-nine predictor variables were used in this study, including reflectances from all eight PSD spectral bands, normalized reflectance differences calculated from PSD band pairs, terrain information, and the number of each ten-day period in the calendar year (N10DOY) ( Table 1).  To overcome the missing PSD data issue caused by cloud and atmospheric constraints, CDF-matching was applied to each 3-m pixel to establish relationships between the SSM target variable and overlying SMAP SSM retrievals and then to apply these empirical models to derive 3-m SSM maps using the SMAP product as the sole model input.

Index_Band ij
Normalized reflectance difference between band i and j 28 elevation The number of each 10-day period in a year 1 The stepwise algorithm ( Figure 2) description is listed below.
(a) Aggregate all predictor variables for 9-km SMAP grid cells and pair with the corresponding SMAP SSM for the same dates using GEE. (b) Perform region-independent cross-validations for model assessment by dividing the SMAP grid cells into seven rows from north to south ( Figure 1), selecting data associated with every six rows for model training, and using data from the remaining row for validation. A total of 2100 SMAP and PSD data pairs were used for the assessment. (c) Select the best-performing model from the resulting ML algorithms, and apply it to the 3-m PSD data under clear-sky conditions. (d) For a given 3-m pixel, perform CDF matching for the SSM estimates of the pixel and the associated SMAP values of the overlying 9-km grid cell, and generate 3-m soil moisture estimates using only the SMAP retrievals as model inputs.

Machine-Learning Methods
Regression-tree-based ML models and conventional linear regression were used first to establish model relationships between SSM and predictor variables at the SMAP spatial scale (9 km for this study); the resulting trained models were then fed high-resolution inputs to obtain local-scale (3-m) SSM estimates. The regression-tree models have shown high efficiency, good accuracy, and robustness in satellite SSM downscaling studies [16,26] and are generally less affected by data noise and overfitting issues in comparison with other machine learning approaches [43].
The regression-tree methods assessed include Random Forest (RF) [43], Gradient Boosting regression (GBRegrssor) [44] and Light Gradient Boosting Machine regression (LightGBRegressor) [45]. The regression-tree mechanism enables cost-sensitive learning and probability tree estimation [46]. Trees are fit using subsets of explanatory variables and the residual "error" between the predicted and actual values assessed. For the RF approach, a number of trees were built using bootstrap samples, and the final regression results obtained from a forest to provide more accurate and stable predictions than achieved using a single tree [47]. The GBRegressor and LightGBRegressor methods are enhanced versions of tree regression algorithms that adopt gradient boosting, which iteratively constructs the model using the prediction errors from each round [44,45]. In addition to the gradient boosting framework, LightGBRegressor has unique features such as a histogram-based algorithm, exclusive feature bundling, leaf-wise tree growth strategy, histogram difference acceleration, and sequential access gradient methods, which enable high computation efficiency and accuracy [45]. All of the models except LightGBRegressor were implemented using the Python scikit-learn library (https://scikit-learn.org/, accessed on 15 March 2022), with the LightGBRegressor method coded using an independent Python library (https://lightgbm. readthedocs.io, accessed on 15 March 2022). For enabling model inter-comparisons, the same sets of training and validation data were used for each model. The model parameters used for tuning and evaluating the machine learning methods are listed in Table 2. For a given parameter, if two values gave similar performance (e.g., R 2 difference less than 0.0025), the smaller value was selected to avoid over-fitting.

Input Variables Used for ML-Based SSM Prediction
As a key parameter in the soil-plant-atmosphere continuum, soil moisture affects (a) plant water status, composition, structure, and growth; (b) water and energy exchange between the atmosphere and land surface; and (c) the partitioning of rainfall between runoff and infiltration [48,49]. The vertical and horizontal distributions of soil moisture are also regulated by water and energy cycle processes through precipitation, plant canopy interception and evapotranspiration, runoff, and infiltration. Considering the inter-connections among soil moisture and other water and energy cycle components, a number of factors influencing or reflecting the fine-scale soil moisture distributions have been considered in ML-based SSM downscaling studies, including land cover type, vegetation conditions, soil texture, land surface temperature, and topographic information [26]. Different from downscaling studies targeting 30-m to 1000-m SSM mapping, it is challenging to estimate meter-level SSM due to the lack of supporting data at comparable scales. For this study, the feasibility of using PSD 3-m reflectance observations and ancillary terrain and date information (Table 1) for ML-based SSM mapping at local scales was examined. The PSD spectral reflectances and associated indices allow for spectra-based descriptions of surface vegetation and soil conditions at 3-m resolution, the terrain elevation and slope variables govern soil water distributions, and N10DOY is used to account for the general SSM seasonality.

CDF Matching for Generating Daily SSM Record
CDF matching is a non-linear method widely used for removing biases among soil moisture data sets derived from different approaches, such as satellite passive/active microwave remote sensing, reanalysis, and in situ measurements [50][51][52]. For this study, CDF matching was aimed to relate SSM values obtained from 9-km grid cells and 3-m pixels and generate more temporally continuous SSM data at 3-m resolution. Specifically, SSM data for a given 3-m pixel were first generated using the ML approach and paired with the corresponding SMAP results for the overlying 9-km grid cell. CDF matching was then used to align SMAP SSM data with the 3-m results. The polynomial function for fitting the two data sets was finally applied to the whole SMAP time series to obtain the corresponding SSM estimates for the 3-m pixels without relying on additional PSD observations or ancillary data. Local-scale SSM mapping was thus achieved while preserving the frequent temporal sampling of SMAP.

Algorithm Assessment
The algorithm assessment consisted of three parts. (a) Examining the performance of the ML models in predicting SSM at 9-km resolution based on the independent validation data (Section 3.1), using the coefficient of determination (R 2 ) and root mean square error (RMSE) calculated between SMAP SSM and the predicted values as performance metrics; (b) quantitative assessment of the ML and CDF matching based SSM downscaling results using the original SMAP 9-km product, the resulting 3-m data sets, and in situ measurements. This assessment involved 14 Yanco sites having at least 30 data pairs of SMAP and ML-predicted SSM available for the CDF calculation. The statistical metrics used for evaluating the performance included R 2 , RMSE, and absolute bias; (c) a qualitative assessment of the resulting local-scale SSM patterns by comparing the 3-m model predictions with the HDAS intensive SSM sampling results.

Assessing the Performance of ML Models in Predicting SSM at 9-km Resolution
In situ soil moisture observations representative of different land cover types, seasons, and geo-locations are scarce and insufficient to support ML model training for global SSM mapping. Therefore, this study trained ML models using the SMAP 9-km SSM product, which has global coverage every 1-3 days. The assessment at the 9-km grid cells showed that the ML models are capable of rebuilding SMAP SSM from finer-grained PSD observations and ancillary information ( Table 3). The regression-tree-based methods have similar accuracy when comparing their predictions with the overlying SMAP SSM retrievals (R 2 values from 0.846 to 0.857; RMSEs from 0.030 to 0.029 cm 3 /cm 3 ) while also outperforming the conventional linear regression approach (R 2 0.591; RMSE 0.050 cm 3 /cm 3 ). Among the regression-tree methods, LightGBRegressor is slightly superior to the others in having the highest correlation (R 2 0.857) and lowest uncertainty (RMSE 0.029 cm 3 /cm 3 ) ( Figure 3; Table 3). The predictor contribution scores are informative for understanding ML prediction mechanisms and the inter-connections among different input variables and SSM. Among the 39 predictors, the 5 most important variables are N10DOY, near-infrared band reflectance, terrain slope, elevation, and red-edge band reflectance ( Table 4). The Normalized Difference Red Edge (NDRE) index, involving red-edge and near-infrared bands, was also an important predictor but with a lower (2.7%) contribution. Low NDRE values typically represent bare soil or stressed vegetation, while higher values indicate healthy vegetation due to NDRE sensitivity to vegetation chlorophyll [53]. Compared to red-edge band reflectance (4.2%) and NDRE (2.7%), the red band and NDVI for measuring vegetation greenness had lower importance, with respective contributions of approximately 1.6% and 1.3%.

Assessing SSM Predictions at 3-m Resolution
The LightGBRegressor model trained over 9-km grid cells was applied to 3-m pixels for the dates when clear-sky PSD observations were available. After initial predictions

Assessing SSM Predictions at 3-m Resolution
The LightGBRegressor model trained over 9-km grid cells was applied to 3-m pixels for the dates when clear-sky PSD observations were available. After initial predictions over the limited dates, the SSM estimates from LightGBRegressor and SMAP were paired in space and time and then used for CDF matching. The polynomial fit for removing SMAP SSM bias relative to ML-based estimates for a given 3-m pixel was finally applied to the entire SMAP record for generating 3-m SSM time series without the use of additional PSD data.
Comparisons among SMAP 9-km, LightGBRegressor 3-m, and additional CDF matching 3-m results were made for a limited number of dates when all three data sets were available. Among the 14 Yanco sites, CDF matching for the 10 sites led to higher accuracy and lower bias of the 3-m SSM predictions in relation to in situ measurements than the 9-km SMAP SSM retrievals (Table 5). For example, the CDF matching results for Site Y12 dramatically reduced the bias level of SMAP SSM (SMAP absolute bias 0.073 cm 3 /cm 3 ; CDF matching absolute bias 0.040 cm 3 /cm 3 ) while removing outliers of the LightGBRegressor predictions (LightGBRegressor RMSE 0.055 cm 3 /cm 3 ; CDF matching RMSE 0.045 cm 3 /cm 3 ) ( Table 5) and enabling continuous SSM predictions (Figure 4a). For the other four sites, the LightGBRegressor predictions provided no additional value in refining the SMAP SSM time series. For example, the LightGBRegressor SSM results for site Yb5e were very similar to those of SMAP, which did not help to remove SMAP SSM biases in the subsequent CDF matching process (Figure 4b). Overall, the CDF-matching-based SSM results showed a similarly high correlation as the SMAP product (R 0.864) and a low level of absolute bias as the LightGBRegressor estimates (0.043 cm 3 /cm 3 ), which led to the lowest RMSE (0.062 cm 3 /cm 3 ) in the comparisons with the in situ SSM measurements (Table 5).

Evaluating SSM Spatial Distributions at 3-m Resolution
For the focused study area (Section 2.1; Figure 1), spatial distributions of SSM at 3-m resolution derived from the combined ML and CDF matching approach were compared with SSM maps interpolated from HDAS intensive sampling results derived using the Inverse Distance Weighted (IDW) approach for 8, 15, and 26 March 2021 (Figure 5a-f). It is noted that multiple independent SSM measurements were taken using HDAS for each location, and the readings may vary widely in highly heterogeneous areas [39]. The interpolation results were generated based on mean soil moisture values averaged from all readings over a given location. Detailed land cover information (Figure 5g) was also used for the comparisons. The dates were selected for examining the local-scale SSM patterns under distinct surface wetness conditions. In addition, Plots 1 and 6 ( Figure 5g) represent managed croplands, which were used to examine irrigation impacts on SSM patterns. The CDF-matching approach used for the paired LightGBRegressor and SMAP SSM estimates was further applied to the entire SMAP record to generate a continuous 3-m SSM record over the study domain. Among the 14 validation sites, 11 sites showed higher accuracy of the downscaled 3-m SSM data than the original SMAP 9-km results relative to the in situ SSM ground truth measurements (Table 6). Overall, the CDF-matching results lowered the RMSE and absolute bias by~0.01 cm 3 /cm 3 , with RMSE decreasing from 0.081 cm 3 /cm 3 to 0.070 cm 3 /cm 3 , while absolute bias declined from 0.052 cm 3 /cm 3 to 0.045 cm 3 /cm 3 (Table 6).

Evaluating SSM Spatial Distributions at 3-m Resolution
For the focused study area (Section 2.1; Figure 1), spatial distributions of SSM at 3-m resolution derived from the combined ML and CDF matching approach were compared with SSM maps interpolated from HDAS intensive sampling results derived using the Inverse Distance Weighted (IDW) approach for 8, 15, and 26 March 2021 (Figure 5a-f). It is noted that multiple independent SSM measurements were taken using HDAS for each location, and the readings may vary widely in highly heterogeneous areas [39]. The interpolation results were generated based on mean soil moisture values averaged from all readings over a given location. Detailed land cover information (Figure 5g) was also used for the comparisons. The dates were selected for examining the local-scale SSM patterns under distinct surface wetness conditions. In addition, Plots 1 and 6 ( Figure 5g) represent managed croplands, which were used to examine irrigation impacts on SSM patterns.
Overall, the downscaled results (Figure 5a-f) captured the contrasting wetness conditions of the three dates as indicated by the HDAS measurements, with high correlation between the two data sets (R 0.81). For 8 March, different from low SSM conditions in most plots, the 3-m results showed wetter soil conditions in the irrigated field (Plot 1), though to a lesser extent than the HDAS readings (Figure 5a,b). Relative to 8 March, larger spatial heterogeneity was found over the fields on March 15, along with similar wetness gradients between non-irrigated (e.g., Plots 8,9,10) and irrigated (e.g., Plot 6) fields captured from both the downscaled results and in situ measurements (Figure 5c,d). For 26 March, despite the overall wet conditions, relatively drier soil conditions (Plots 8, 9, and 10) and a noticeable south-north wetness gradient (Plot 5) were apparent in both the downscaled results and HDAS data (Figure 5e,f).
Major inconsistencies between the two data sets were found in Plots 3 and 4 over 8 and . The HDAS measurements differ from the downscaled results in data acquisition time and spatial representativeness, so the correlation analysis results need to be interpreted cautiously, in particular when high SSM heterogeneity exists and leads to a large difference in HDAS readings at ground sampling locations.
For the areas without HDAS measurements, linear features with lower/higher SSM relative to surrounding pixels were associated with road/drainage systems (Figure 5a,c,e), and higher SSM was found over more densely vegetated areas partially covered by trees (e.g., grass and trees). In comparison, the two 9-km SMAP grid cells (Figure 1) overlying the focused study region were unable to capture a similar level of SSM variability. Overall, the downscaled results (Figure 5a-f) captured the contrasting wetness ditions of the three dates as indicated by the HDAS measurements, with high correla between the two data sets (R 0.81). For 8 March, different from low SSM condition most plots, the 3-m results showed wetter soil conditions in the irrigated field (Plo though to a lesser extent than the HDAS readings (Figure 5a,b). Relative to 8 March, la spatial heterogeneity was found over the fields on March 15, along with similar wet gradients between non-irrigated (e.g., Plots 8,9,10) and irrigated (e.g., Plot 6) fields tured from both the downscaled results and in situ measurements (Figure 5c,d). Fo March, despite the overall wet conditions, relatively drier soil conditions (Plots 8, 9, 10) and a noticeable south-north wetness gradient (Plot 5) were apparent in both downscaled results and HDAS data (Figure 5e . The HDAS measurements differ from the downscaled sults in data acquisition time and spatial representativeness, so the correlation anal results need to be interpreted cautiously, in particular when high SSM heterogeneity ists and leads to a large difference in HDAS readings at ground sampling locations.
For the areas without HDAS measurements, linear features with lower/higher S relative to surrounding pixels were associated with road/drainage systems (Figure 5a, and higher SSM was found over more densely vegetated areas partially covered by t

Discussion
This study proposed a new approach for deriving both high temporal and spatial resolution SSM data using a combination of ML, statistical modeling, and multi-sensor fusion. Of the ML approaches tested, the LightGBRegressor method showed the best performance in estimating SSM using independent and reflectance-based observations over 9-km grid cells (R 2 0.857; RMSE 0.029 cm 3 /cm 3 ). Other regression-tree-based methods are also suitable for this application, given their comparable performance to the LightGBRegressor results (Table 3). These alternative approaches include traditional RF methods widely used for SSM downscaling [20,26] and additional RF refinements using gradient boosting methods. One advantage of this new approach is that the model training does not rely on any in situ observations or measurements from airborne campaigns, which enables the approach to be generally applicable to other regions where high-quality SMAP retrievals are available.
When analyzing the importance of SSM predictor variables, the NDRE was weighted more than NDVI in the ML prediction. Considering the inherent relationships between vegetation growth and soil wetness, NDVI was among the main inputs for downscaling passive microwave SSM [24,26], while this study suggests that vegetation health conditions represented by NDRE are likely related more closely to SSM than the greenness quantified by NDVI. Soil moisture in the Yanco region has clear seasonality with generally drier conditions in the austral summer (DJF) and wetter soil conditions in winter (JJA) [52], which likely led to the relatively high importance of N10DOY in the SSM prediction. In addition, elevation and slope factors are among the most important predictors, which reflect the impacts of topographic control on SSM spatial distributions [26,54]. Despite the high performance of the ML models over the 9-km grid cells, this new approach may still be constrained by the limited spectral information provided by PSD 8-band observations. Richer spectral information, in particular additional thermal band observations used in previous studies [24], may enable further improvements in model performance.
There are two challenges in deriving meter-level SSM using the ML model trained using coarser 9-km grid cell data. One is that the training data sets representing or aggregated for 9-km grid cells may not be comprehensive enough to cover the range of local variability represented from 3-m pixels. The training data were collected from 49 SMAP 9-km grid cells spanning a 12-month period and accounting for a variety of surface soil and vegetation conditions. However, larger study regions and a longer training period would likely enable further algorithm enhancement. Another issue is the relatively sparse temporal coverage of clear-sky PSD observations due to frequent cloud cover causing approximately 80% missing data out of all possible PSD observations during the study period. Although clouds are a well-known constraint in optical remote sensing, this limits the capability of downscaling approaches for generating continuous SSM products needed in many applications, such as irrigation management [20]. In addition, despite the overall lower RMSE and bias of the LightGBRegressor versus 9-km SMAP SSM results relative to the in situ measurements (Table 5), the 3-m ML predictions had relatively lower correlations and more data point outliers (e.g., Figure 4a). To increase the temporal fidelity, the LightGBRegressor results paired with the corresponding SMAP SSM values were fed into the additional CDF matching process. The CDF matching removes outliers in the ML predictions likely caused by noise in PSD reflectance observations under suboptimal atmospheric conditions and, more importantly, for building a continuous SSM time series by accounting for the cross-scale SSM relationships for each 3-m pixel. After applying CDF matching, it was possible to generate SSM time series at 3-m resolution with similarly low RMSE and bias as the LightGBRegressor results while maintaining similar high correlation with in situ observations as the SMAP product.
It is worth noting that the underlying assumptions of our approach are that (a) the SMAP product has high accuracy over the 9-km grid cells, as has been shown for the Yanco region [14], (b) station-based SSM measurements are representative of the overall surface wetness of 3-m pixels, and (c) the ML results can capture the unique soil wetness conditions for a given 3-m pixel. These assumptions held for most of the 14 sites examined (10 sites in Table 5; 11 sites in Table 6), which showed better SSM performance of the downscaled results than the original SMAP product. If the above assumptions are incorrect, no improvement in SSM at fine scales would be expected since the LightGBRegressor results would fail to represent the surface wetness level observed on site (e.g., Figure 4b). The sites without performance enhancement in the downscaled results relative to the SMAP product (e.g., Yb5e, Yb5d, and Yb3; Tables 5 and 6) are concentrated in a small area with relatively homogeneous soil properties and land use (mainly pasture) [55]. The lack of fine-scale variations in surface properties in this area likely leads to little added contribution to the SSM estimations from high-resolution SuperDove observations. There are no additional measurements within the 3-m pixels for evaluating the spatial representativeness of the in situ SSM measurements; however, differences between the 3-m pixel results and in situ SSM measurements, which may be biased due to site installation and management activities [55], likely contribute to uncertainties in the model validation (Section 4.2; Tables 5 and 6).
The 3-m SSM distributions over the focused study area were examined under three contrasting wetness conditions. In general, irrigated farms and denser vegetation cover corresponded with higher 3-m SSM levels, while non-irrigated land, bare ground, and roads showed lower wetness. The fine-scale land features with significant SSM spatial variations caused by different irrigation regimes and vegetation cover were generally captured by the downscaled results. A major issue identified is the SSM overestimation relative to the HDAS measurements (e.g., Plots 3 and 4), which was likely caused by high-level retrieval biases as found in other locations (Tables 5 and 6). In addition, the 3-m SSM estimates for irrigated fields were higher than the surrounding fields but lower than the HDAS measurements. The sub-daily irrigation signals are likely partially missed in the slower changes of vegetation conditions captured by the PDS observations, which led to the underestimation of downscaled SSM.

Conclusions
Capabilities for mapping meter-scale soil moisture conditions globally and with hightemporal repeat are essential to local-scale environmental studies and applications. Here, a new method for 3-m SSM mapping was developed by integrating information from satellite passive microwave and high-resolution optical remote sensing through machine-learning and CDF-matching approaches. Compared with the original 9-km SMAP product, the resulting 3-m SSM predictions showed higher accuracy and lower bias in relation to independent soil moisture observations from the Yanco region while preserving the high sensitivity to surface wetness and temporal coverage of SMAP. Complex soil moisture patterns consistent with heterogeneous land cover and vegetation conditions within a focused study area were only captured by the downscaled results. Potential algorithm refinement and implementation for global regions would enable quantification of local-scale land surface heterogeneity and processes for improving the assessment of local environmental changes, disaster risk mitigation, water resources management, and precision agriculture.