Comparison of Machine Learning-Based Snow Depth Estimates and Development of a New Operational Retrieval Algorithm over China

: Snow depth estimation with passive microwave (PM) remote sensing is challenged by spatial variations in the Earth’s surface, e.g., snow metamorphism, land cover types, and topography. Thus, traditional static snow depth retrieval algorithms cannot capture snow thickness well. In this study, we present a new operational retrieval algorithm, hereafter referred to as the pixel-based method (0.25 ◦ × 0.25 ◦ grid-level), to provide more accurate and nearly real-time snow depth estimates. First, the reference snow depth was retrieved using a previously proposed model in which a microwave snow emission model was coupled with a machine learning (ML) approach. In this process, an effective grain size (effGS) value was optimized by utilizing the snow microwave emission model, and then the nonlinear relationship between snow depth and multiple predictive variables, e.g., effGS, longitude, elevation, and brightness temperature (Tb) gradients, was established with the ML technique to retrieve reference snow depth data. To select a robust and well-performing ML approach, we compared the performance of widely used support vector regression (SVR), artiﬁcial neural network (ANN) and random forest (RF) algorithms over China. The results show that the three ML models performed similarly in snow depth estimation, which was attributed to the inclusion of effGS in the training samples. In this study, the RF model was used to retrieve the snow depth reference dataset due to its slightly stronger robustness according to our comparison of results. Second, the pixel-based algorithm was built based on the retrieved reference snow depth dataset and satellite Tb observations (18.7 GHz and 36.5 GHz) from Advanced Microwave Scanning Radiometer 2 (AMSR2) during the 2012–2020 period. For the pixel-based algorithm, the ﬁtting coefﬁcients were achieved dynamically pixel by pixel, making it superior to the traditional static methods. Third, the built pixel-based algorithm was veriﬁed using ground-based observations and was compared to the AMSR2, GlobSnow-v3.0, and ERA5-land products during the 2012–2020 period. The pixel-based algorithm exhibited an overall unbiased root mean square error (unRMSE) and R 2 of 5.8 cm and 0.65, respectively, outperforming GlobSnow-v3.0, with unRMSE and R 2 values of 9.2 cm and 0.22, AMSR2, with unRMSE and R 2 values of 18.5 cm and 0.13, and ERA5-land, with unRMSE and R 2 values of 10.5 cm and 0.33, respectively. However, the pixel-based algorithm estimates were still challenged by the complex terrain, e.g., the unRMSE was up to 17.4 cm near the Tien Shan Mountains. The proposed pixel-based algorithm in this study is a simple and operational method that can retrieve accurate snow depths based solely on spaceborne PM data in comparatively ﬂat areas.


Introduction
As a key parameter indicating snow mass, snow water equivalent (SWE) plays a vital role in processes related to fresh drinking water for humans and animals, agricultural water uses and the prevention of natural hazards [1][2][3][4][5]. Thus, accurate knowledge of the spatiotemporal variations in SWE would improve hydrological forecasts and water resource management [6][7][8].
Satellite passive microwave (PM) remote sensing is an available and effective tool for monitoring global SWE due to its daily observation capability, good spatial coverage, and high sensitivity to the amount of snow [9,10]. The underlying physics of PM-based SWE retrieval is that the volume scattering of microwave radiation in snow varies with wavelength [11]. Typically, long waves (e.g., K-band) have strong penetrability to snow, reflecting the radiation characteristics of underground surfaces, whereas short waves (e.g., Ka-band) are easily scattered by snow particles. Thus, the contrast between the microwave brightness temperature (Tb) at the K-and Ka-bands can yield SWE estimates.
PM SWE retrieval algorithms are structured into empirical and semiempirical formulas [9,[11][12][13][14][15][16], physical-based methods [17][18][19][20][21][22][23], machine learning (ML) approaches [24][25][26][27][28][29][30], and assimilation models [31][32][33][34][35][36]. The most widely used methods for operational product generation are still empirical and semiempirical algorithms. This is because these algorithms are independent of complex auxiliary data, computationally inexpensive, and easy to operate. In addition, empirical algorithms can provide snow depth estimates in real time, which plays an important role in disaster monitoring and relief efforts. Physical-based methods have low computational efficiency. In addition, rich prior knowledge is required to run physical emission models of snowpack. Moreover, the accuracy of the snow emission model is questionable at the satellite pixel scale (~25 km). Thus, the physical-based algorithm is less suitable for operational purposes. The assimilation methods depend on auxiliary data, e.g., ground-based observations and weather forcing datasets, to correct the estimated SWE. Moreover, land surface models are generally used to provide input parameters for snow emission models (observation operators), while atmospheric reanalysis data are required to implement the land surface model. In this process, some potential problems, e.g., model structure errors and model forcing errors, probably lead to large uncertainties in global-scale SWE retrieval [4,35,37]. Thus, the methodology of the assimilation model is very complex and not suitable for operation.
ML-based algorithms are currently popular and widely used to retrieve land surface parameters. ML techniques are a powerful set of tools for fitting the multivariate nonlinear relationships between predictor variables and a given dependent variable. Generally, the relationship between snow depth and satellite-observed Tb is nonlinear due to snow metamorphism, forest canopy, atmosphere, saturation effect and wet snow [38][39][40][41][42]. The greatest challenge is how to consider those factors in a defined algorithm. Unfortunately, this kind of retrieval algorithm has not yet been built due to an unclear physical basis. Although ML's "black box" weakness has received substantial criticism from the scientific community, it initiated the era of data-driven models and will benefit traditional quantitative models. Previous studies have demonstrated that ML techniques present promising performance in retrieving stationary land surface parameters, e.g., soil types and soil properties. However, they are also challenged by the nonstationary Earth system [43]; for example, temporally and spatially dynamic snow microstructures still limit ML applications in snow depth retrieval. Our previous study explored the potential of the random forest (RF) approach in snow depth retrieval and demonstrated that a well-trained RF algorithm in a specific area is not applicable over different regions because snow characteristics change both spatially and over time, and there was no prior snowpack information (predictor variable) in the RF training samples [29]. To address this problem, we proposed a methodology that combines the snow microwave emission model with the RF approach [30]. In this methodology, an effective GS value (effGS), a prior snowpack descriptor, was optimized by utilizing the Helsinki University of Technology (HUT) model by minimizing the difference between the Advanced Microwave Scanning Radiometer 2 (AMSR2) observations (both 18.7 and 36.5 GHz) and HUT simulations. Then, we used the RF model to build the nonlinear relationship between snow depth and selected predictor variables, such as vertical polarized Tb differences between 18.7 and 36.5 GHz observations and 10.65 and 36.5 GHz observations, longitude and elevation, and the optimized effGS.
We previously reported the baseline retrieval algorithm and demonstrated its performance in snow depth estimation [30]. The proposed model was greatly superior to a single RF algorithm without effGS. Additionally, it significantly outperformed other snow depth products, e.g., AMSR2 and GlobSnow-v2.0. Our validation also showed that it can partially address some major issues presented in existing algorithms, e.g., underestimation for thick snow (>20 cm) and overestimation for thin snow cover (≤20 cm). However, the proposed model is seriously dependent on auxiliary data, e.g., snow and ground temperatures, snow density, and ground-based snow depth data. That is, this algorithm does not work if those auxiliary data are unavailable. In addition, the proposed model cannot monitor snow depth in near real time, which limits its applications in hydrological forecasting and water resource management. In our previous publication, we directly used the RF model in combination with the HUT model [30], but the performances of various other ML models in snow depth estimation were unknown.
Thus, the specific objectives of this follow-up study were to (1) compare the snow depth estimates from different ML methods, including support vector regression (SVR), artificial neural network (ANN), and RF, and determine which ML technique is more suitable for retrieving snow depth; (2) develop a simple operational snow retrieval algorithm (named the "pixel-based algorithm" in this study) based solely on space-borne PM data over China; and (3) compare the pixel-based algorithm with the GlobSnow-v3.0, AMSR2, and ERA5-land retrieval methods. The outline of this paper is as follows: Section 2 provides a description of the datasets and methodology of coupling the HUT model with the RF technique; Sections 3 and 4 provide the results and the discussion, respectively; and conclusions are given in Section 5.

Ground-Based Measurements
The weather station data consisted of eighteen years of winter snow (2012-2020) data that were collected from the National Meteorological Information Centre, China Meteorology Administration (CMA, http://data.cma.cn/en (accessed on 15 May 2020)). The spatial distribution of the meteorological stations is shown in Figure 1. The attribute parameters of each station include the site name, geolocation, and elevation. The daily measured variables consist of ground surface temperature, air temperature, latitude, longitude, elevation, and snow depth. In this study, station observations of snow from eight winters (2012-2020, corresponding to the AMSR2 period) were used to train the ML model and verify its performance. To maintain the training samples independent of validation data, all the stations were randomly separated into two parts, A and B (see Figure 1). Here, part A samples from 341 stations were used to train the ML models, while part B measurements from 342 stations, as spatially independent reference data, were employed to evaluate the well-trained ML algorithms and the proposed pixel-based algorithm (see Section 3.2).
Additionally, field campaign observations were collected to support the assessment of snow depth estimates. Four snow courses were designed to measure snow characteristics (snow depth, snow density, snow mass, and temperature) in Northeast China and northern Xinjiang [30]. Figure 1 shows the four snow survey routes, hereafter referred to as snow courses 1, 2, 5, and 6. Snow courses 1 and 2 were distributed around the Junggar Basin and the Tien Shan Mountains, respectively, in northern Xinjiang. Snow courses 5 and 6 were located in Northeast China. Snow course 5 stretched from Xilingol to the greater Khingan range, and snow course 6 included parts of the Changbai Mountains and Lesser Khingan Mountains. For snow courses 5 and 6, the land cover types were abundant, including grassland, farmland, barren land, and forest. The snowpit measurements were performed for three successive winters from 2018-2020, every 15 to 30 km along each snow course. All snowpit measurements within a satellite pixel were averaged as the ground truth value. Basically, there was only one snowpit measurement in each satellite pixel. The part B station observations during the 2012-2020 period, together with the snow course observations from 2018 to 2020, were used to assess the pixel-based algorithm (see Sections 2.4 and 3.4) proposed in this study and other snow depth products, e.g., assimilated GlobSnow-v3.0 data, remotely sensed AMSR2 estimates, and the reanalysis ERA5-land product.

Gridded Products
The AMSR2 instrument onboard the Global Change Observation Mission (GCOM) satellite of the Japanese Aerospace Exploration Agency (JAXA) has collected data since 2012 at frequencies of 6.9 GHz, 7.3 GHz, 10.7 GHz, 18.7 GHz, 23.8 GHz, 36.5 GHz, and 89.0 GHz with both V and H polarizations [44]. AMSR2 EASE-Grid Tb data (L3) from December 2012 to March 2020 were downloaded from http://gportal.jaxa.jp/gpr/ (accessed on 25 May 2020). Here, the AMSR2 Tb data were selected to train the ML models due to their advanced payloads, such as fine footprint size, which can reduce the uncertainties caused by the mixed pixels. Meanwhile, AMSR2 L3 Tb data were used to build the pixel-based algorithm and produce snow depth estimates during the 2012-2020 period.
To demonstrate the magnitude of improvement in snow depth estimation for the newly proposed algorithm, the retrieval results were compared with those of three globally published products (Table 1): the stand-alone satellite AMSR2 product, the assimilated GlobSnow-v3.0 retrieval results, and the reanalysis ERA5-land data. Note that the temporal resolution of the ERA5-land product is at the hour scale, and the data at two o'clock near the descending time of AMSR2 were selected. For the GlobSnow-v3.0 product, snow density was assumed to be a constant value of 240 kg/m 3 [45]. Thus, 240 kg/m 3 was applied to convert SWE to snow depth in this study. To maintain a uniform spatial resolution with AMSR2, the GlobSnow-v3.0 and ERA5-land products were also spatially resampled to 0.25 • × 0.25 • grid scale (Table 1).

ML Models
In this study, three widely used ML approaches (ensemble tree-based RF, kernelbased SVR, and neural network-based ANN) were selected to compare their performances in snow depth retrieval. Figure 2 illustrates the architectures of the three ML models. The RF algorithm is an ensemble ML algorithm proposed by Breiman in 2001 [46]. It combines several randomized decision trees and aggregates their predictions by averaging in regression. The default value of the number of trees in the ensemble (ntree) is 500. The number of random variables at each node (mtry) is typically set to the square root or 1/3 of the number of input variables for regression tasks.
SVR was initially developed to solve classification problems and then extended to regression tasks by introducing the ε-insensitive loss function [47,48]. ε is the termination criterion of the loss function; the default is 0.001. The key configurations of SVR include the punishment factor (c), kernel function selection, and gamma (g) parameter of the kernel function. Parameter c denotes the tolerance to error; typically, a high (low) c usually leads to an overfitting (underfitting) problem. The kernel is actually a mapping function that converts input predictor variables from low dimensions to high dimensions (hyperspace) so that samples can be separated easily. In general, the radial basis function (RBF) kernel is a reasonable first choice according to previous studies [49]. For the RBF kernel, the g parameter determines the distribution of high-dimensional data and typically defaults to the reciprocal of the number of input variables. In this study, we used the grid-search method to determine the optimal c and g parameters in the range of [2 −10 , 2 10 ].
The inspiration for designing the ANN learning process comes from the working mode of the human brain [50,51]. Currently, a multilayer perceptron (MLP) is one of the most popular ANN types and is widely used in various studies and applications [49,52]. The architecture of the ANN consists of input, hidden, and output layers ( Figure 2). The function of the input layer is to fetch the input elements and pass them to the first hidden layer of neurons. The output layer denotes the prediction results obtained by the final hidden layer that are then returned as the output of the ANN. The task of the hidden layer is to build the nonlinear relationship between the inputs and the outputs through a Levenberg-Marquardt backpropagation learning rule. The hidden layers are fully interconnected through a popular logistic sigmoid (logsig) function [25]. In this study, a relatively simple network structure (two hidden layers with 10 neurons) was used according to a previous study [25,53,54].

Workflow
The workflow of this study is shown in Figure 3. The first step is to optimize the effective grain size using HUT simulation and AMSR2 observation. Our previous study demonstrated that RF performance was greatly improved when effGS was involved in the training of predictor variables [30]. The effGS variable, as the most important predictor variable, was optimized by utilizing the HUT model by minimizing the difference between AMSR2 observations and HUT simulations at 18.7 and 36.5 GHz. The optimization procedure of effGS (ranging from 0 to 4 mm, with a step size of 0.01 mm) is described as follows: where SD denotes the station-based snow depth (cm), Tb 18.7V and Tb 36.5V denote the vertically polarized observations (K) at 18.7 and 36.5 GHz from AMSR2, respectively, d 0 is effGS (mm), ρ is the snow density (kg/m 3 ) from the ERA5-land product, and T snow is the snow surface physical temperature (K) provided by daily weather station observations. The second step is to determine predictor variables ( Figure 3). According to our previous study [30], the training predictor variables consist of (Tb 10.65V − Tb 36.5V ), (Tb 18.7V − Tb 36.5V ), elevation, longitude, and effGS (Table 2 and Figure 3). According to our study in [30], latitude is significantly correlated with elevation (negatively) and longitude (positively). Moreover, longitude has a large variation in snow cover areas, spanning from 75 • E to 134 • E ( Figure 1). Thus, longitude is more important to ML estimates than latitude. Although latitude was excluded from the predictor variables, elevation and longitude actually play a role in latitude. The third step is to select an optimal ML model ( Figure 3). To determine a more accurate reference snow depth dataset, three ML approaches, the RF, SVM, and ANN, were verified and compared ( Figure 3). Then, the best ML model was selected to provide a reference snow depth dataset for the pixel-based retrieval algorithm, in which fitting coefficients vary spatially pixel by pixel.
To assess three trained ML algorithms, a well-known k-fold cross-validation (k-CV) technique was performed. In this process, all samples were first randomly and equally divided into k subsamples. Then, (k − 1) randomly selected subsamples were used to train ML models, and the remaining subsample was used to evaluate the trained ML algorithms. This procedure was repeated k times to avoid evaluation uncertainty. The above samplebased k-CV method cannot guarantee that the training samples are both temporally and spatially independent from the validation data. Thus, two extended k-CV approaches, temporal-and spatial-based k-CV, were applied to assess the performances of well-trained ML algorithms [30]. For temporal-and spatial-based k-CV, all samples were randomly divided into k groups according to dates (time) and station location (space), respectively. In this study, k was set to 10, namely, 10-CV. In addition, the spatially independent data from part B stations (Figure 1) were used to further validate snow depth estimates retrieved by the ML model trained with observations from part A stations (Figure 2).
The fourth step is to build the pixel-based algorithm (Figure 3). The proposed algorithm referenced the traditional empirical format, but the fitting coefficient varies in each satellite pixel. In this study, the 89 GHz channel was excluded because it is more subject to atmospheric contributions. Considering the merits of 10.65 GHz in monitoring comparably thick snowpack, a Tb gradient (Tb 10 where SD denotes the estimated snow depth (cm). Tb 10.65V , Tb 18.7V , and Tb 36.5V denote the vertically polarized Tb observations at 10.65, 18.7, and 36.5 GHz. Vertical polarization was used in this study because the channels of V-pol are more sensitive to snow depth than the H-pol channels [29,55]. The slope and intercept are the fitting coefficients. In PAG1, a low frequency of 10.65 GHz was considered due to its strong penetrability to deep snow; that is, the Tb spectral difference at 10.65 GHz and 18.7 GHz can partially reflect thick snowpack. The PAG2 algorithm makes long-term snow depth production possible based solely on observations at two frequencies (18.7 and 36.5 GHz). The PAG1 and PAG2 algorithms were compared in snow depth retrieval using part B station observations, and then a suitable model was determined for monitoring snowpack. The last step is to validate the pixel-based algorithm estimates and compare them with those of the other three products (Figure 3). To demonstrate the advantages of the proposed pixel-based retrieval algorithm, three global snow depth datasets, the remotely sensed AMSR2 product, the assimilated GlobSnow-v3.0 product, and the reanalysis ERA5-land product, were compared to the new proposed algorithm in three stable snow cover areas across China using part B station observations and field campaign measurements (Figure 3).

Sensitivity of ML Models to Training Sample Size
One objective of this study was to compare the snow depth estimates based on three ML methods and determine which ML technique is more accurate in snow depth retrieval. To confirm the appropriate number of training samples, determination of the sensitivity of the three ML models to the training sample size is necessary. Thus, a test for three models trained with the same training samples (from 5000 to 50,000) was conducted during the 2015-2020 period. Then, the trained ML models were verified using station-observed snow depth data from 2012 to 2014. To demonstrate whether the sensitivity is affected by effGS, the trained ML models were compared based on predictor variables with and without effGS. Figure 4 shows the ML models' performances with increasing training sample size. The results show that any ML model considering the effGS variable performs better than that without effGS. Additionally, inclusion of effGS in the ML model leads to a stable performance, that is, a poor sensitivity to sample size. If effGS values are excluded from the training samples, the correlation coefficients increase with increasing sample size, and the unRMSE values decrease, especially for the ANN technique.

ML Model Performances
Based on the analysis in Section 3.1, the number of training samples was set to 30,000, which is large enough to ensure that the ML model performs stably and to make the cross-comparison much more objective. Figure 5 shows the validation of the SVR, ANN and RF algorithms with sample-, temporal-, and spatial-based 10-CV approaches. This demonstrates that these three ML algorithms perform similarly in snow depth retrieval, which is attributed to the inclusion of effGS in training samples (Figure 3). The effGS variable records the spatial-temporal variation in snow metamorphism and makes the ML model robust and stable.
The ML-based estimates were further verified with spatially independent station measurements from part B stations, whereas the ML retrieval models were trained with measurements from part A stations ( Figure 1). Figure 6 shows the validation and comparison results. Here, ANN1, SVR1, and RF1 represent the trained ML model without considering effGS, while ANN2, SVR2, and RF2 consider effGS. This result demonstrates that the ANN2-, SVR2-, and RF2-based estimates were in good agreement with the station observations, with a high R2 of 0.86 and a low unRMSE of approximately 3.4 cm. However, ANN1, SVR1, and RF1 present poor performances, with an unRMSE of up to 7.5 cm. Thus, the inclusion of effGS in predictor variables enhanced the predictive power of the ML models (Figures 4-6). In this study, the RF model was selected to retrieve the snow depth reference dataset for building a pixel-based retrieval algorithm due to its high computational efficiency relative to the computational efficiencies of the SVR and ANN methods.

Development of the Pixel-Based Algorithm
Based on the RF estimates (see Section 3.2) during the 2012-2020 period, the slope and intercept coefficients were determined for each pixel based on the linear fitting between Tb gradients and ML snow depth estimates ( Figure 7). Here, the snow cover detection method proposed by Li et al. [56] was used to identify dry snow pixels. Both the slope and intercept present high spatial heterogeneity (Figure 7). In Northeast China and northern Xinjiang, the slope varies smoothly, but the intercept changes sharply. The slope denotes the linear relationship between the Tb gradient and snow depth, while the intercept denotes the primary snow depth when the Tb gradient is zero. According to scattering theory, the snow depth should be 0 cm when the Tb gradient is equal to zero. In fact, saturation effects of microwave signals occur due to snow characteristics, such as snow depth and snow metamorphism, for example, under deep and mature snow conditions. As illustrated in Figure 7, the intercept coefficient is high in deep snow cover areas, such as the Tien Shan and Altai Mountains, Changbai Mountains, and Greater and Lesser Khingan Mountains. The slope correlation is very low and even negative in unstable snow cover areas, such as the Taklamakan Desert and the southern part of the Qinghai-Tibet Plateau (QTP).  To demonstrate the role of the 10.65 GHz channel in retrieving snow depth, two kinds of pixel-based algorithms (PAG1 and PAG2) are compared in Figure 8. Here, the part B station observations from 2012 to 2020 were treated as true data. Figure 8 shows that the PAG1 and PAG2 algorithms perform similarly in January, February, November, and December but present some differences in March, with correlation coefficients of 0.74 and 0.69, respectively. Figure 8 also shows that the PAG1 and PAG2 algorithms have similar performances in thin snow cover areas (less than 20 cm). However, PAG1 outperformed PAG2 under deep snow cover conditions (greater than 20 cm) because of the inclusion of the 10.65 GHz channel. Generally, the 10.65 GHz channel plays a significant role in snow depth estimation globally, but it is inapparent in China. Figure 9 shows the relationship between AMSR2observed Tb gradients and field-measured snow depth data along four snow courses (see Figure 1). Apparently, both (Tb 10.65V − Tb 18.7V ) and (Tb 18.7V − Tb 36.5V ) present an increase with increasing snow depth. However, (Tb 18.7V − Tb 36.5V ) is more sensitive to snow depth than (Tb 10.65V − Tb 18.7V ); that is, the small dynamic range of (Tb 10.65V − Tb 18.7V ) for snow depths is from 0 to 70 cm. Snow cover is generally shallow over China's flat areas; thus, both 10.65 GHz and 18.7 GHz channels can penetrate snow and achieve microwave emission of soil beneath snowpack. Although the snow cover over mountains is deep, the effects of the topography and mixed pixels disturb the relationship of the Tb gradient and snow depth. For example, snow course 2 is located in the Tien Shan Mountains. As illustrated in Figure 9, the relationship between the Tb gradient and snow depth is poor. Although the point-based measured snow depth is as high as 50-70 cm, the averaged snow depth in a pixel (0.25 • × 0.25 • ) is unknown; that is, there is a poor representativeness of so-called true data. Figure 9. Relationship between the satellite-observed Tb gradient and snow course-measured snow depth over China.

Evaluation of the Pixel-Based Algorithm and Comparison with Other Satellite Products
The pixel-based snow depth algorithm was verified using observations from part B stations during the 2012-2020 period and snow course measurements from 2017 to 2020. Additionally, it was compared to the global remotely sensed AMSR2, assimilated GlobSnow-v3.0 and reanalysis ERA5-land products. Figure 10 illustrates the overall performances of the four algorithms' estimates over China. The proposed pixel-based algorithm outperforms the other three methods, with a higher R 2 of 0.65 and lower unRMSE and MAE values of 5.82 cm and 4.21 cm, respectively. GlobSnow-v3.0 and ERA5-land have similar overall performances. The AMSR2 snow depth estimates present a serious overestimation over China, with the highest bias of 18.52 cm among these methods. Figure 11 shows the comparison and validation results of the pixel-based algorithm, AMSR2, GlobSnow-v3.0, and ERA5-land products in three stable snow cover areas (Northeast China, northern Xinjiang and the QTP) across China. The pixel-based algorithm outperforms the other three products in any snow cover area. The AMSR2 estimates tend to be higher than the station measurements in northern Xinjiang, with a bias of 13.83 cm. Additionally, the AMSR2 product presents a large uncertainty, with a high unRMSE of 22.47 cm. The ERA5-land product also presents a high error (unRMSE: 14.48 cm) in northern Xinjiang, but the averaged estimates tend to be close to the station measurements. Although the unRMSE of GlobSnow-v3.0 is 11.78 cm, the relationship between the estimates and station measurements is poor. In Northeast China, the AMSR2 and ERA5-land products present an overestimation for snow depths less than 50 cm, especially for AMSR2, with the highest bias of 21.14 cm. The pixel-based algorithm is superior to AMSR2 and ERA5-land products on the QTP. However, it tends to underestimate snow depth for conditions greater than 20 cm. The validation and comparison of the pixel-based algorithm and three products along with four snow courses are illustrated in Figures 12-15. For snow courses 1, 5 and 6, the pixel-based algorithm's estimates are in good agreement with the ground measurements (Figures 12-14). For snow courses 5 and 6, ERA5-land has a similar performance to the pixel-based algorithm (Figures 13 and 14). Figures 12-14 also demonstrate that ERA5-land estimates are more accurate in Northeast China than in northern Xinjiang. AMSR2 estimates are significantly related to the ground-measured snow depth but present an overestimation, especially for deep snow conditions. The GlobSnow-v3.0 product also presents a large uncertainty in northern Xinjiang and Northeast China, although it performs outstandingly when applied globally, based on previous studies [4,55].
The proposed pixel-based algorithm outperforms the other three products in flat and stable snow cover areas (Figures 12-14). However, deriving snow depth in mountains remains a challenge for space-borne PM remote sensing. Figure 15 shows the validation and comparison of snow depth estimates retrieved from the pixel-based algorithm and GlobSnow-v3.0, AMSR2, and ERA5-land products in the Tien Shan Mountains. Note that GlobSnow-v3.0 s sample size is smaller than that of the other three products because of its mountain mask. Here, the roughness of topography was calculated as the logarithm of the elevation's standard deviation within a 0.25 • × 0.25 • pixel, that is, roughness = log e (Stdev), with darker colors indicating more undulant terrain. Figure 15 shows that these four methods consistently present poor performance. The pixel-based algorithm tends to underestimate snow depth for deep snow conditions (>40 cm) where the roughness of terrain is high. AMSR2 and ERA5-land products can reflect deep snow, with large unRMSE values of 18.94 cm and 18.10 cm, respectively. GlobSnow-v3.0 estimates perform best among the four products, with an unRMSE value of 12.23 cm. However, some samples with unusually deep snow conditions are filtered out due to mountain masks. Figure 16 shows the spatial patterns of snow depth based on four retrieval algorithms over China. They all show that snow cover is thick in northern Xinjiang and Northeast China. The estimates from AMSR2 are the highest among these four methods, even over 50 cm. The spatial pattern of ERA5-land estimates is consistent with the topography, that is, thick snow in mountains and comparatively shallow snow in flat areas.

Discussion
ML techniques present outstanding performance in snow depth estimation when effGS, which denotes snow metamorphism, is included in the training sample (Figures 4-6). However, ML-based snow depth estimates are still disturbed by complex terrain. For example, our previous study demonstrated that unRMSE presents a significant increasing trend (from 4.7 cm to 19.6 cm) with increasing roughness, and the underestimation (from 2.6 cm to −9.8 cm) tends to be considerably serious [30]. Thus, the uncertainties from MLbased estimates are surely propagated to the proposed pixel-based algorithm in this paper.
According to the validation in Section 3, the pixel-based algorithm presented a better performance in northern Xinjiang and Northeast China than in the QTP (Figures 11-14). However, in the complex terrain areas, e.g., the Tien Shan Mountains, the pixel-based algorithm was completely ineffective, especially for conditions with depths greater than 40 cm (Figure 15). Snow depth estimation in complex mountains is challenged by two main problems. One is that ground-based measurements are generally sparse in remote areas and high mountains ( Figure 1). Moreover, the snow cover presents strong heterogeneity, especially for the snow depth, due to the complex topography. Thus, the point measurements are unrepresentative at a coarse pixel resolution (typically on the order of tens of kilometers). Another challenge is that microwave signals are affected not only by the snow mass but also by heterogeneous forest cover, terrain variability, and incomplete snow cover [57]. Synthetic aperture radar (SAR) remote sensing provides a promising prospect for mapping snow depth in mountains. For example, a recent work by Lievens et al. [58,59] attempted to retrieve snow depth in complex mountains using Sentinel-1 (SE1) SAR C-band observations. The results demonstrated that it provides much more accurate snow depth in Europe. Figure 17 shows the snow depth spatial distribution of the SE1 product and pixel-based estimates. Figure 17a shows the initial SE1 product at a 0.01 • × 0.01 • scale. Figure 17b shows the linearly resampled SE1 product at a 0.25 • × 0.25 • scale. The SCF was determined by the number of initial SE1 pixels in a 0.25 • × 0.25 • grid. Figure 17c shows the pixel-based estimates at a 0.25 • × 0.25 • scale. We also compared the results of SE1 estimates and station measurements in three complex mountains (Altai Mountains, Tien Shan Mountains, and QTP) over China (Figure 17d-f). Figure 17d-f shows that the SE1 estimates are much higher than the station observations, even in areas where the SCF approximately reaches 100%, indicating that snow thickness presents strong heterogeneity in a 0.25 • × 0.25 • pixel due to the complex topography. Thus, it is difficult to determine which dataset (station vs. SE1) represents the true snow depth in spatial pixels. For the station observation, its representativeness in a coarse pixel is unknown. For the SE1 estimates, the C-band is not feasible in comparatively shallow snow-covered areas due to its strong penetrability relative to the Ku-band (Yueh et al., 2009;King et al., 2015;Tsang et al., 2021). The snow grain sizes are typically 1-3 mm and are more than 20-55 times smaller than the C-band wavelength of 5.5 cm, which leads to slight volume scattering by snow grains in the C-band. At least, it is necessary to verify whether SE1 estimates can be treated as true data in mountains.
In the future, a combination of active sensors (C-band, X-band, and Ku-band) can be exploited to improve snow depth estimates in mountain areas [8], which can further support a quantitative assessment of the uncertainty in snow depth retrieval with PM observations. Additionally, automatic measurement networks, e.g., global navigation satellite system receivers (GNSS-R), satellite altimetry, light detection and ranging (LiDAR), can support studies of snow cover in remote areas and high mountains [60][61][62]. Figure 8 shows that the uncertainty of the proposed pixel-based retrieval algorithm increases from January to March, e.g., the unRMSE increases from 5.1 cm to 7.9 cm, and corr.coe decreases from 0.84 to 0.69. One reason is snow accumulation events, with the peak snow depth typically being reached in the middle of March. Another reason is snow metamorphism (grain size), which usually leads to stronger volume scattering than that of the snow depth [35]. In this study, the fitting coefficients of the pixel-based retrieval algorithm are dynamic at the spatial scale but fixed at the temporal scale, which neglects the scattering contribution caused by the snow grain evolution from the beginning to the end of the snow season. To reduce the errors caused by snow metamorphism, our ongoing work will attempt to build temporal (monthly) and spatial (pixel by pixel) dynamic retrieval algorithms based on ML estimates. However, this kind of method still cannot solve the problem thoroughly and theoretically because snow grains increase, even at the subhour scale. The optimum method is to develop an algorithm in which a specific index can decouple the scattering effects of snow grains and snow mass. In addition, the pixel-based method proposed in this paper is only suitable for retrieving snow depth in dry snow conditions. For dry snowpack, snow scattering typically dominates the signal at some frequencies, e.g., the K-and Ka-bands. Once snow contains liquid water, the penetration depth of electromagnetic waves decreases, and the snowpack absorbs radiation from the soil beneath the snowpack. Thus, wet snow also increases the uncertainty of the pixel-based algorithm.

Conclusions
Our previous study proposed a methodology that coupled HUT-optimized effGS parameters with the RF ML approach and demonstrated that it significantly improved snow depth estimates over China due to the inclusion of effGS in predictor variables (Yang et al., 2021). However, this method is a complex and time-consuming approach because input data is needed to drive the snow emission model to optimize effGS and then train the ML model.
This study further presents a new operational snow depth retrieval algorithm that implements ML-based estimates. The newly proposed snow depth estimation algorithm references the traditional empirical format, but the fitting coefficient varies in each satellite pixel. Thus, the new method is called the pixel-based retrieval algorithm in this study. To provide more accurate ML-based estimates for building pixel-based algorithms, we compared three widely used ML approaches (SVR, ANN and RF) over China. Meanwhile, we tested the sensitivity of these ML models to the training sample size. The results indicated that (1) the three ML approaches had significantly different performances if the predictor variables did not include effGS and presented sensitivity to the training sample size, especially for the ANN technique, and (2) the three ML models presented consistent performances in snow depth estimation if the predictor variables included effGS and simultaneously presented poor sensitivity to the training sample size.
The RF model was used to retrieve the snow depth reference dataset (2012-2020) in this study due to its slightly stronger robustness. The pixel-based algorithm was built based on the retrieved reference snow depth dataset and satellite Tb observations and then verified using ground-based observations from weather stations and field campaigns. Additionally, pixel-based algorithm estimates were compared to the global AMSR2, GlobSnow-v3.0 and ERA5-land products. The results indicated that the pixel-based algorithm exhibited overall unRMSE and R 2 values of 5.8 cm and 0.65, respectively, outperforming GlobSnow-v3.0, with unRMSE and R 2 values of 9.2 cm and 0.22, AMSR2, with unRMSE, and R 2 values of 18.5 cm and 0.13, and ERA5-land, with unRMSE and R 2 values of 10.5 cm and 0.33, respectively. Moreover, the pixel-based algorithm's estimates were closest to the ground measurements along snow courses 1, 5, and 6, with unRMSE values of 4.8 cm, 6.6 cm, and 5.2 cm, respectively.
The proposed pixel-based algorithm in this paper is a simple and operational method and can retrieve accurate snow depth based solely on space-borne PM observations in comparatively flat areas. The pixel-based algorithm presents a high uncertainty in the complex terrain, e.g., the unRMSE was up to 17.4 cm around the Tien Shan Mountains. Additionally, it presents a serious underestimation for deep snow conditions (>40 cm). We are attempting to improve snow depth estimates in mountainous areas by combining SAR C-, X-, and Ku-band observations and will present this endeavor in future work.

Conflicts of Interest:
The authors declare no conflict of interest.