Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere

: In this study, a machine learning algorithm was introduced to fuse gridded snow depth datasets. The input variables of the machine learning method included geolocation (latitude and longitude), topographic data (elevation), gridded snow depth datasets and in situ observations. A total of 29,565 in situ observations were used to train and optimize the machine learning algorithm. A total of ﬁve gridded snow depth datasets—Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) snow depth, Global Snow Monitoring for Climate Research (GlobSnow) snow depth, Long time series of daily snow depth over the Northern Hemisphere (NHSD) snow depth, ERA-Interim snow depth and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) snow depth—were used as input variables. The ﬁrst three snow depth datasets are retrieved from passive microwave brightness temperature or assimilation with in situ observations, while the last two are snow depth datasets obtained from meteorological reanalysis data with a land surface model and data assimilation system. Then, three machine learning methods, i.e., Artiﬁcial Neural Networks (ANN), Support Vector Regression (SVR), and Random Forest Regression (RFR), were used to produce a fused snow depth dataset from 2002 to 2004. The RFR model performed best and was thus used to produce a new snow depth product from the fusion of the ﬁve snow depth datasets and auxiliary data over the Northern Hemisphere from 2002 to 2011. The fused snow-depth product was veriﬁed at ﬁve well-known snow observation sites. The R 2 of Sodankylä, Old Aspen, and Reynolds Mountains East were 0.88, 0.69, and 0.63, respectively. At the Swamp Angel Study Plot and Weissﬂuhjoch observation sites, which have an average snow depth exceeding 200 cm, the fused snow depth did not perform well. The spatial patterns of the average snow depth were analyzed seasonally, and the average snow depths of autumn, winter, and spring were 5.7, 25.8, and 21.5 cm, respectively. In the future, random forest regression will be used to produce a long time series of a fused snow depth dataset over the Northern Hemisphere or other speciﬁc regions.


Introduction
Snow cover is a fundamental component of the global energy and water cycles [1,2]. The extent and duration of the Northern Hemisphere snow cover have been substantially reduced as a result of the warming of surface temperatures [3]. Snow depth is an even more crucial parameter than snow cover area for climate studies, hydrological applications, weather forecasts, and disaster assessments [4][5][6][7][8]. However, reliable quantitative could improve precision. Xiao et al. [37] found that the SVM method performed well in snow depth retrieval from passive microwave brightness temperature, and auxiliary data and thus used it to generate a long time series of snow depth for the Northern Hemisphere [36]. Yang et al. [34] first used Random Forest (RF) to derive a long time series of a snow depth product that was more precise than the Che algorithm output [38]. RF was the most effective at reducing bias in SNOw Data Assimilation System (SNODAS) SWE in Ontario, Canada, with an absolute mean bias of 0.2 mm and RMSE of 3.64 mm when compared with in situ observations [39].
These papers demonstrated the potential of machine learning methods to produce more accurate snow depth estimates, but they did not incorporate existing snow depth products directly over the Northern Hemisphere. The existing snow depth datasets, which were produced via passive microwave brightness temperature and in situ observations or reanalysis data, are based on complex physics theory and production processes. Although these datasets have individual advantages, there is a strong need to fuse them into a new product that will incorporate their original characteristics. Regional climate models can provide higher-resolution snow depth information for specific regions. However the computational cost related to complex atmospheric physics schemes limits the production of a product with a long time series for the entire Northern Hemisphere [40]. The statistical downscaling snow depth is also appropriate only for specific small areas [25]. Previous studies have demonstrated the potential for using multiple snow depth products ensembles to improve the accuracy of snow depth datasets [8,10] and constrain uncertainty [9,11]. At present, machine learning provides a suitable approach to fuse snow depth datasets over large scales.
The objectives of this study include two aspects: (1) to test the performances of different machine learning methods on the snow depth data fusion, and (2) to produce high-quality snow depth data by using a suitable machine learning method based on five gridded snow depth datasets and in situ observations over the Northern Hemisphere. Section 1 presents the research background and significance. Section 2 introduces the datasets and methodologies. Section 3 compares the three machine learning methods and the fused snow depth dataset validation by independent in situ observations. Section 4 presents a discussion of the effect of the different input elements and compares it with the results of previous studies, and Section 5 summarizes this work.

Data
The snow depth datasets used in this study include three types: remote sensing snow depth datasets (i.e., AMSR-E, NHSD and GlobSnow), reanalysis snow depth datasets (i.e., MERRA-2 and ERA-Interim) and ground-based observations. The three remote sensing snow depth datasets and two reanalysis snow depth datasets were considered as independent variables (Table 1). Auxiliary data mainly include land cover data and topographical data. ρ represents snow density, 'SD' denotes the average snow depth in a 0.25 • × 0.25 • pixel, 'SWE' stands for the average snow water equivalent in one pixel, 'SD *' denotes the average snow depth in snow-covered area of a pixel and 'fsc' stands for fraction of snow cover in a pixel.

Remote Sensing Snow Depth Datasets
(1) AMSR-E Snow Depth Dataset The Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) is a passive microwave sensor onboard the Aqua satellite. The AMSR-E daily remote sensing snow depth dataset was acquired from the Japanese Aerospace Exploration Agency (JAXA, https://suzaku.eorc.jaxa.jp/, accessed on 9 February 2021). This dataset used an improved Chang algorithm that takes into account the forest fraction [41]. If the snow was considered shallow based on the passive microwave brightness temperature threshold detected, snow depth was set to 5 cm. When snow depth was deemed deep, the improved Chang algorithm was applied to retrieve the snow depth. This study used the AMSR-E daily snow depth dataset with a spatial resolution of 0.25 • . This dataset does not fully cover the entire Northern Hemisphere; striped gaps southward of 55 • N can be found in the daily images. We used the adjacent two days to complete the dataset before the fusion process. AMSR-E was launched in 2002 and discontinued in 2011. For this study we selected all the available data from 2002 to 2011.
(2) GlobSnow Snow Depth Dataset Global Snow Monitoring for Climate Research (GlobSnow) is a Northern Hemispheric SWE dataset from the European Space Agency (ESA, http://www.globsnow.info/swe/, accessed on 9 February 2021) that was based on the assimilation of satellite microwave radiometer data and weather station data [7]. This method assimilates the daily in situ snow depth into the Helsinki University of Technology (HUT) snow microwave emission model to improve the simulation accuracy, obtaining more accurate snow parameters. The spatial coverage of GlobSnow SWE is 35 • N~85 • N, with an original spatial resolution of 25 km × 25 km. In this study, the GlobSnow SWE product was transformed into snow depth by dividing it by a constant snow density of 0.24 g/cm 3 . The snow depth dataset was resampled to a spatial resolution of 0.25 • × 0.25 • to match the other datasets. Some days for the month of September were not calculated as the data were missing in the GlobSnow SWE product. The spatial coverage of all snow depth datasets was limited to 35 • N~85 • N, matching the spatial coverage of the GlobSnow dataset.
(3) NHSD Snow Depth Dataset Long time series of daily snow depth over the Northern Hemisphere (NHSD) [38,42] are available from the national Tibetan plateau data center (TPDC, https://poles.tpdc.ac. cn/, accessed on 9 February 2021). The dataset was produced based on multiple-sensor passive microwave brightness temperature data using a modified Chang algorithm [32]. This is a dynamic algorithm that was developed at a pixel scale based on in situ snow depth observations. For every available pixel, a linear equation between in situ snow depth and the brightness temperature gradient in each month was built. The coefficients of these Remote Sens. 2021, 13, 1250 5 of 23 equations were interpolated to all pixels in the Northern Hemisphere. In forested areas, the forest cover fraction was used to decrease the influence of forests. Besides, to improve the temporal consistency of the long time series of brightness temperatures, an inter-sensor calibration was performed between neighboring sensors.

Reanalysis Snow Depth Datasets
(1) ERA-Interim Snow Depth Dataset ERA-Interim [22] is the fourth generation of reanalysis data from the European Center for Medium-Range Weather Forecasts (ECMWF, https://apps.ecmwf.int/datasets/data/ interim-full-daily/, accessed on 9 February 2021). The snow-related parameters of ERA-Interim are derived from the hydrology tiled ECMWF schemes (TESSEL) for surface exchange over land. In this study, snow depth was calculated from snow density and SWE. The SWE and snow density datasets were downloaded with a resampled spatial resolution of 0.25 • × 0.25 • and temporal resolution of 6 h. First, the average 6-hourly snow depths were calculated from the SWE and snow density; then, these snow depths were averaged to daily data.
(2) MERRA-2 Snow Depth Dataset The Modern Era Retrospective Analysis for Research and Applications [43], Version 2 (MERRA-2) is produced by the Global Modeling and Assimilation Office of NASA (GMAO, https://disc.gsfc.nasa.gov/datasets/, accessed on 9 February 2021). MERRA-2 offers several atmospheric and surface key variables on a global scale. The land surface model of Catchment [43] was applied in MERRA-2. Based on the average snow depth of the snow-covered area in a pixel and snow cover fraction, we derived the mean snow depth of the pixel. The original spatial resolution of this product was 0.5 • × 0.625 • . We resampled its spatial resolution to 0.25 • × 0.25 • by nearest interpolation in order to match the other datasets.

Ground-Based Measurement
Ground-based observations were used to construct and validate the machine learning snow depth fusion models. There are four sources of ground observations, including the meteorological station data from China and Russia, snow survey data from Russia, and the Global Historical Climatology Network (GHCN) daily dataset. In this study, we selected the observations available for the period from 2002 to 2011.
Daily snow depth of China was collected from the national meteorological information center of the Chinese Meteorological Administration (http://data.cma.cn/, accessed on 9 February 2021), with 923 stations used in this study. This dataset offers daily snow depth, location, and elevation of the station. Daily snow depth is manually observed at 8:00 a.m. with a ruler. These data were calibrated and quality checked before they were released on the national meteorological data platform.
Daily snow depth from Russia from 2002 to 2011 was derived from the Russian meteorological center (http://aisori.meteo.ru/ClimateR, accessed on 9 February 2021). Snow depth, location and elevation of the station, and the quality control field were obtained from the dataset. Anomalous records were marked out in this dataset during the quality check. After screening, there 576 stations remained in this dataset.
The snow survey data of Russia were also obtained through the Russian meteorological center (http://aisori.meteo.ru/, accessed on 9 February 2021). This field survey dataset contains the snow depth (i.e., deepest snow depth, shallowest snow depth, and average snow depth), snow density every 5 to 10 days from September to May. Snow depth larger than the deepest snow depth or less than the lowest snow depth was regarded as anomalous and eliminated in this study. Finally, 514 efficient stations remained for this study.
The Global Historical Climatology Network (GHCN) daily dataset (https://data.nodc. noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00861, accessed on 9 February 2021) provides Remote Sens. 2021, 13, 1250 6 of 23 snow depth data and elevation. Data in this dataset were filtered according to the quality control field. Data that failed in internal consistency check, climatological outlier check, spatial or temporal check, etc., were marked out in the quality control field and removed. Finally, 27,552 stations remained for this study.

Auxiliary Data
(1) Reclassification of Land Cover Data The land cover data used in this study were from GlobCover 2009 (https://due.esrin. esa.int/page_globcover.php, accessed on 9 February 2021), which was produced by ESA and the Université Catholique de Louvain (UCL). GlobCover2009 includes 23 class types according to the United Nations Land Cover Classification System (LCCS) ( Table A1, Appendix A). The land cover types of GlobCover 2009 were reclassified into five classes, forest, shrub, prairie, bare land, and unclassified. When executing the machine learning snow depth fusion algorithm, the type unclassified was excluded from the calculation. The original spatial resolution of GlobCover is 300 m × 300 m. In order to match the spatial resolution of the snow-depth datasets, the data were resampled into a grid of 0.25 • × 0.25 • and the land type covering the largest proportion in a grid was assumed as the true land cover. We reclassified the original GlobCover2009 into five classes ( Figure 1; Table A1, Appendix A), consistently with previous studies [37]. Snow depth fusion models were established in the area of forest, grassland, shrub, and bare-land, respectively. (1) Reclassification of Land Cover Data The land cover data used in this study were from GlobCover 2009 (https://due.esrin.esa.int/page_globcover.php, accessed on 09 February 2021), which was produced by ESA and the Université Catholique de Louvain (UCL). GlobCover2009 includes 23 class types according to the United Nations Land Cover Classification System (LCCS) ( Table A1). The land cover types of GlobCover 2009 were reclassified into five classes, forest, shrub, prairie, bare land, and unclassified. When executing the machine learning snow depth fusion algorithm, the type unclassified was excluded from the calculation. The original spatial resolution of GlobCover is 300 m × 300 m. In order to match the spatial resolution of the snow-depth datasets, the data were resampled into a grid of 0.25° × 0.25° and the land type covering the largest proportion in a grid was assumed as the true land cover. We reclassified the original GlobCover2009 into five classes ( Figure 1; Table  A1), consistently with previous studies [37]. Snow depth fusion models were established in the area of forest, grassland, shrub, and bare-land, respectively. (2) Topographic Data Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) archieved in, https://topotools.cr.usgs.gov/gmted_viewer/ (accessed on 09 February 2021), is an update of GTOPO30, and is produced by the United States Geological Survey (USGS). GMTED offers three spatial resolutions: 30 arc-seconds, 15 arc-seconds, and 7.5 arc-seconds [44]. In this study, the data with a spatial resolution of 30 arc-seconds were resampled into the grid of 0.25° × 0.25° used for snow depth fusion.

Methodology and Experimental Design
In this paper, three widely used machine-learning methods (i.e., Random Forest Regression (RFR), Support Vector Regression (SVR), and ANN) were adopted. This section provides a general description of the three machine learning methods, experimental design and assessment index of model performance.

Machine Learning Methods
ANN is typically composed of interconnected neuronal units organized in layers and can be used in problems of classification and regression [45]. In this study, we applied the backpropagation artificial neural network (BP-ANN). Generally, the BP-ANN model has three layers, namely the input layer, hidden layer and output layer. The input variables (2) Topographic Data Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) archieved in, https://topotools.cr.usgs.gov/gmted_viewer/ (accessed on 9 February 2021), is an update of GTOPO30, and is produced by the United States Geological Survey (USGS). GMTED offers three spatial resolutions: 30 arc-seconds, 15 arc-seconds, and 7.5 arc-seconds [44]. In this study, the data with a spatial resolution of 30 arc-seconds were resampled into the grid of 0.25 • × 0.25 • used for snow depth fusion.

Methodology and Experimental Design
In this paper, three widely used machine-learning methods (i.e., Random Forest Regression (RFR), Support Vector Regression (SVR), and ANN) were adopted. This section provides a general description of the three machine learning methods, experimental design and assessment index of model performance.

Machine Learning Methods
ANN is typically composed of interconnected neuronal units organized in layers and can be used in problems of classification and regression [45]. In this study, we applied the backpropagation artificial neural network (BP-ANN). Generally, the BP-ANN model has three layers, namely the input layer, hidden layer and output layer. The input variables were propagated from the input layer to the output layer through the hidden layer, while the error was transmitted in the opposite direction, thereby correcting the connection weight of the network [46]. A neural network structure consists of a transfer function, a learning algorithm, many hidden layers, training and predicting datasets [47]. In this work, the transfer functions tan-sigmoid and purelin were applied from the input layer to the hidden layer and from the hidden layer to the output layer, respectively. A combination of a gradient descent method and the Gauss-Newton method was adopted as the learning algorithm. Support Vector Regression (SVR) is a supervised learning algorithm for regression [48,49]. SVR relies on establishing a regression function, and SVR is a statistical learning theorybased machine learning formalism. In the SVR model, the input variables will be first mapped into a high-dimensional feature space using a kernel function, either linear or nonlinear depending on the relationship between the dependent and independent variables. Then, a linear model is constructed in the feature space to balance between minimizing errors and overfitting [50]. SVR is gaining popularity because of its many attractive features and promising generalization performance. SVR considers an input vector and the number of geophysical variables at a given location in space and time. Selecting a suitable kernel function is very important in this method. In this study, the radial kernel function was chosen for model training and prediction.
Random Forest Regression (RFR) is an ensemble learning technique that combines a large set of decision trees for regression, and each regression tree is independent of others [51]. Several randomized decision trees aggregate their predictions via regression [52]. The RFR generally only requires two user-defined parameters, the number of trees in the ensemble, and the number of random variables at each tree. The RFR model has been widely used in remote sensing information extraction because of its high flexibility and precision. As RFR compensates the bias brought by a single decision tree through the randomness, RFR does not easily over-fit and has extremely high accuracy and fast training speed; thus, it is suitable for dealing with big data. In this paper, we used the randomForest R package on the cloud platform supported by the Big Earth Data Science Engineering Project of Chinese Academy of Science (CASEarth) (http://workbench.casearth.cn/, accessed on 9 February 2021).

Experimental Design
Based on previous assessments, the performance of different snow depth products shows inconsistencies in different seasons and landcover types [11,15,17]. A complete snow cover year was defined as the period between September of the previous year (t-1) and May of the current year (t). Additionally, a complete snow year was divided into three snow-seasons [16,53]: autumn (September to November), winter (December to February) and spring (March to May). In this study, seasonal information and land cover types were used as a priori conditions to form 12 models. In total, three machine learning algorithms were applied in this study, thus extending these 12 models to 36 models ( Figure 2).
In these models, the input variables included AMSR-E, NHSD, GlobSnow, MERRA-2, ERA-Interim, latitude, longitude, and elevation. In the phase of model training, the independent input variables were Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, and MERRA-2 snow depth datasets, and the dependent variable was the set of in situ observations. In the phase of the model prediction, the input variables were Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, and MERRA-2 snow depth datasets; the dependent variable was the predicted snow depth. Because of the large numbers of samples in an entire hydrological year (

Forest
Bare-land Accuracy assessment of the fused snow depth dataset In these models, the input variables included AMSR-E, NHSD, GlobSnow, MERRA-2, ERA-Interim, latitude, longitude, and elevation. In the phase of model training, the independent input variables were Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, and MERRA-2 snow depth datasets, and the dependent variable was the set of in situ observations. In the phase of the model prediction, the input variables were Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, and MERRA-2 snow depth datasets; the dependent variable was the predicted snow depth. Because of the large numbers of samples in an entire hydrological year (Table 2)  The selected datasets were used for model training and prediction. All the input variables and in situ observations were normalized to have a mean of 0 and a standard deviation of 1 [54]. The selected samples were divided into two parts, 80% of the samples were used for training the model, and the rest 20% were used for the model prediction (20%).
These 36 fused snow-depth models were evaluated by the coefficient of determination (R 2 ), root mean square error (RMSE) and mean absolute error (MAE). We also calculated the bias of in situ observations and fused values (BIAS) to evaluate the spatial error between the fusion dataset and observations: where n is the number of sample pixels, Si andŜi denote the in situ observation and fused snow depth values of the i-th pixel, respectively. S represents the mean value of in situ observations of n pixels. The optimal machine learning method was selected to fuse the snow depth dataset. A "leave-one-year-out" cross-validation for each divided dataset was conducted to determine the performance of the optimal method in continuous time series estimation of snow depth. Finally, a time series comprehensive snow depth dataset covering the Northern Hemisphere was derived from 2002 to 2011.

Comparison among the Fused Snow Depth from Three Machine Learning Methods
In this study, we applied three machine learning models and four land cover classes to design the 12 models. We also divided the whole hydrological year into three seasons (autumn (September to November), winter (December to February), and spring (March to May)) to derive 36 pairs of accuracy assessment indices. The results of these 36 snow-depth fusion models are presented in Table 2. In the model comparison phase, the input variables for RFR, SVR, and ANN were the same. In the same season and same land cover type, the RFR model had a higher R 2 , and lower RMSE and MAE, indicating that the RFR model was preferable over ANN and SVR. Especially in March to May, RMSE and MAE's values decreased substantially compared to those of ANN and SVR. The calculation of RFR was also more efficient than that of ANN and SVR. Therefore, the RFR model was used to produce the fused snow depth data over the Northern Hemisphere from 2002 to 2011.

Comparison between the Fused Dataset and Five Other Snow Depth Datasets Based on Observations
The fused snow depth dataset shows better R 2 , RMSE, and MAE than the five original snow-depth datasets. The fused results indicate that the original five snow-depth datasets have weak correlations with the observed snow-depths. The RFR fusion, therefore, significantly improves the accuracy of the snow-depth datasets. The R 2 increases to 0.91, the RMSE and MAE decrease to 5.5 and 2.2 cm, respectively ( Figure 3). Based on the accuracy assessment, we found that the snow depth dataset fused with the RFR algorithm is very consistent with the station observations. Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 24

Comparison Between the Fused Dataset and Five Other Snow Depth Datasets Based on Observations
The fused snow depth dataset shows better R 2 , RMSE, and MAE than the five original snow-depth datasets. The fused results indicate that the original five snow-depth datasets have weak correlations with the observed snow-depths. The RFR fusion, therefore, significantly improves the accuracy of the snow-depth datasets. The R 2 increases to 0.91, the RMSE and MAE decrease to 5.5 and 2.2 cm, respectively ( Figure 3). Based on the accuracy assessment, we found that the snow depth dataset fused with the RFR algorithm is very consistent with the station observations.  This result indicates that most pixels have snow-depths of less than 50.0 cm, and many pixels have a very high fusion accuracy. The spatial distribution of the BIAS shows that 89.85% of the pixels (9293 pixels out of a total of 10,343) have a BIAS between −5.0 The fused snow depth values are distributed near the 1:1 line (Figure 4). This result indicates that most pixels have snow-depths of less than 50.0 cm, and many pixels have a very high fusion accuracy. The spatial distribution of the BIAS shows that 89.85% of the pixels (9293 pixels out of a total of 10,343) have a BIAS between −5.0 and 5.0 cm ( Figure 5).
The fused results also show that only very few pixels have a BIAS value greater than 5.0 cm, or less than −5.0 cm. The spatial pattern of the BIAS ( Figure 6) exhibits a good consistency with station observations over the Northern Hemisphere, especially at low latitudes.  This result indicates that most pixels have snow-depths of less than 50.0 c many pixels have a very high fusion accuracy. The spatial distribution of the BIAS that 89.85% of the pixels (9293 pixels out of a total of 10,343) have a BIAS betwe and 5.0 cm ( Figure 5). The fused results also show that only very few pixels have a BIAS value greater than 5.0 cm, or less than −5.0 cm. The spatial pattern of the BIAS ( Figure 6) exhibits a good consistency with station observations over the Northern Hemisphere, especially at low latitudes. The fused results also show that only very few pixels have a BIAS value greater than 5.0 cm, or less than −5.0 cm. The spatial pattern of the BIAS ( Figure 6) exhibits a good consistency with station observations over the Northern Hemisphere, especially at low latitudes.

Accuracy Assessment of the Fused Snow Depth Dataset at Five Independent In Situ Snow Observation Sites
To further verify the accuracy of the fused snow depth dataset, five independent snow sites recommended by the Earth System Model-Snow Model Intercomparison Project (ESM-SMIP) were selected for validation. The detailed information of these five sites is in Table A2. These sites have in situ snow depth and SWE data. The series of snow depth data were extracted from the fused snow dataset and observations using overlapping time

Accuracy Assessment of the Fused Snow Depth Dataset at Five Independent In Situ Snow Observation Sites
To further verify the accuracy of the fused snow depth dataset, five independent snow sites recommended by the Earth System Model-Snow Model Intercomparison Project (ESM-SMIP) were selected for validation. The detailed information of these five sites is in Table A2, Appendix A. These sites have in situ snow depth and SWE data. The series of snow depth data were extracted from the fused snow dataset and observations using overlapping time series (Figure 7).  From the accuracy assessment of these five sites, Sodankylä(SOD) [55] performed best with R 2 , RMSE and BIAS of 0.88, 8.6 and 4.0 cm, respectively (Figure 7a). Although the accuracies of the Old Aspen(OAS) [56] and Reynolds Mountain East(RME) [57] are not as good as those of Sodankylä, they are still within the accepted scope. Swamp Angel Study Plot (SASP) [58] and Weissfluhjoch(WFJ) [59] sites, which have deeper snow depth values, do not have a good performance; the fused snow depth shows a prominent underestimation. The five original gridded snow depth products ( Figure A1, Appendix A) all have a distinct underestimation, indicating that the input variables of the RFR model are very important to the fused results. The geographical location and topographic conditions are complex in SASP. SASP is in a sheltered area and is surrounded by a sub-alpine forest. Therefore, it experiences lower winds and is better suited for measuring precipitation. Under these conditions, while deeper snow develops in SASP, this also leads to a distinct underestimation of snow depth based on the remote sensor. The WFJ site is located in an almost flat part of a southeasterly oriented slope and at an altitude of about 2540 m [59]. During the winter months, deeper snow builds up at this altitude. In future work, the RFR modeled dataset should be improved based on this validation.

The Spatial Distribution of the Fused Snow Depth Dataset Based on Random Forest Regression
Based on the daily fused snow depth dataset, we derived the monthly average, seasonal average, and yearly average snow depth. The multi-year average snow depth and different seasonal average snow depth were also calculated.
The spatial distribution of the fused daily snow depth product over the Northern Hemisphere from 2002 to 2011 is shown in Figure 8a. The spatial pattern of the multi-year average snow depth over the Northern Hemisphere indicates that the regions with deep snow are distributed in the Northern American and Siberian plain. This spatial pattern also reveals that most parts of the mid and low latitude regions (<50 • N) have a shallower snow depth of less than 5.0 cm. The snow depth in China is relatively shallow (<5.0 cm). The fused snow-depth in autumn (Figure 8b) is smaller than in winter ( Figure 8c) and spring (Figure 8d). The average snow depths of autumn, winter and spring are 5.7, 25.8 and 21.5 cm, respectively. In autumn, snow depth varies from 0 to 66.4 cm, and snow depths of most regions are less than 20.3 cm. In winter, most of the regions have deeper snow, which varies from 0 to 258.0 cm. In spring, the spatial pattern of snow depth is similar to that in winter. Because the new algorithm incorporates the advantage of remote sensing and reanalysis products, the range of the fused snow depth varies from 0 to more than 200 cm.
Based on the daily fused snow depth dataset, we derived the monthly average, seasonal average, and yearly average snow depth. The multi-year average snow depth and different seasonal average snow depth were also calculated.
The spatial distribution of the fused daily snow depth product over the Northern Hemisphere from 2002 to 2011 is shown in Figure 8a. The spatial pattern of the multi-year average snow depth over the Northern Hemisphere indicates that the regions with deep snow are distributed in the Northern American and Siberian plain. This spatial pattern also reveals that most parts of the mid and low latitude regions (<50° N) have a shallower snow depth of less than 5.0 cm. The snow depth in China is relatively shallow (<5.0 cm). The fused snow-depth in autumn (Figure 8b) is smaller than in winter ( Figure 8c) and spring (Figure 8d). The average snow depths of autumn, winter and spring are 5.7, 25.8, and 21.5 cm, respectively. In autumn, snow depth varies from 0 to 66.4 cm, and snow depths of most regions are less than 20.3 cm. In winter, most of the regions have deeper snow, which varies from 0 to 258.0 cm. In spring, the spatial pattern of snow depth is similar to that in winter. Because the new algorithm incorporates the advantage of remote sensing and reanalysis products, the range of the fused snow depth varies from 0 to more than 200 cm.

The Effect of Seasons on the Fused Snow Depth Dataset
In the current study, the snow year was divided into three seasons for the machine learning training and predicting phases, which was consistent with previous studies [16,37,53]. To compare the performance of different machine learning methods in the fusion of the snow depth datasets in different seasons and land cover types, the three seasons were assessed independently. In autumn, the snow depth is shallow (<5.0 cm), so the fused

The Effect of Seasons on the Fused Snow Depth Dataset
In the current study, the snow year was divided into three seasons for the machine learning training and predicting phases, which was consistent with previous studies [16,37,53]. To compare the performance of different machine learning methods in the fusion of the snow depth datasets in different seasons and land cover types, the three seasons were assessed independently. In autumn, the snow depth is shallow (<5.0 cm), so the fused results perform much worse than in winter and spring (Table 2). In autumn, while the absolute values of RMSE and MAE were smaller given the shallower snow depth, the R 2 had the worst performance. Overall, the machine learning fusion algorithms performed better in deeper snow (>10.0 cm). Currently, the fused snow depth dataset based on the machine learning algorithm has higher accuracy in winter (December to February) and spring (March to May) than in autumn (September to November). The average snow depth over the Northern Hemisphere from 2002 to 2011 varies from 0.7 to 34.6 cm (Figure 9).
In winter and spring, the average snow depth is deeper and stable, which may reduce some uncertainties, so the fused snow depth performs well in these two seasons. In autumn, the snow depth is shallow and increases slowly. The snow depth retrieval scheme based on remote sensing microwave brightness temperature is based on a set threshold (e.g., 5.0 cm in GlobSnow). This threshold also leads to some uncertainties. The accuracy assessment indicates that when the snow depth exceeds about 10.0 cm, the fused snow depth performs better (Figure 7).

Improvement between the Current Study and Previous Work
Previous studies retrieved snow depth employed machine learning algorithms combining brightness temperature with other auxiliary data. The applications of machine learning models have improved the estimation accuracy of snow depth [37] and SWE [60] over the Northern Hemisphere. Tedesco and Jeyaratnam [60] derived a densification formula using Bayesian statistics for SWE estimations from passive microwave brightness temperature observations based on in situ snow depth, density, and SWE for each snow climate class. NASA's current V2.0 AMSR-E SWE algorithm utilizes an artificial neural network, snow emission modeling, and climatological snow depth data for the estimation of snow depth and the detection of dry versus wet snow conditions [60].
Our study differed mainly with regards to the selected input variables. Xiao et al. [37] employed SVM using passive microwave brightness temperatures and auxiliary data to derive long time series of daily snow depth over the Northern Hemisphere. The evaluation of this snow depth product with the other two snow cover products (GlobSnow and ERA-Interim/Land) showed that it performs comparably well with relatively high accuracy [36].
Snauffer et al. [8] used ANN combined with some gridded snow datasets to derive a comprehensive SWE product. In this study, five snow depth datasets were introduced to integrate a new snow depth product based on machine learning methods.
As described in the aforementioned studies, using machine learning methods and brightness temperature can improve the accuracy of the snow depth inversion. Brightness temperature data of some sensors have striped gaps resulting in missing data in some areas [41]. However, the snow depth product provides continuous coverage and avoids missing data. In the current study, the fused accuracy improved because the input variables were already snow depth products.

Determining the Input Parameters of the RFR Model
In the experimental design, all gridded snow depth products and auxiliary data were considered as the input parameters for machine learning methods (Section 2.2.2). To test the importance of these input variables for the model, a scheme (Table 3) was designed to verify the result.  As described in Table 3, one input variable was deleted in every scheme, and then the importance of that variable was evaluated. In the designed schemes, when deleting some variables, the accuracy of the fused snow depth makes a difference, but these changes are not significant based on the RMSE and MAE (Table 4). Although the ERA-Interim model occupied the second rank in terms of accuracy performance, it was excluded. We can't conclude that the ERA-Interim was not important for the model. Accuracy assessments of gridded snow depth datasets indicated that ERA-Interim exhibits overall better agreements with in situ observations than other datasets [9]. The rank of our input snow depth datasets did not indicate that this variable was important or nonsignificant for the model. The model was best when all variables were inputted into the fused model. Therefore, we selected as many gridded snow depth products as possible to pursue the most accurate results.
The result of Table 4 used the training samples of 2002-2003, and the model prediction phase applied the samples of 2003-2004. The land cover type was Bare-land, and the time period was December to February. "All variables" indicates that all variables were used as input elements, "Elevation excluded" means that the input variables did not contain Elevation, etc. By deleting the input variables in the RFR model one by one, the results showed that the RFR model is a more stable machine learning model and that it could be used to produce a long time series of snow depth product.

Limitations of the Current Study
As described in previous relevant papers, machine learning methods could overcome various complex problems existing in large-scale retrievals [27][28][29]. Machine-learning methods can learn and summarize a large number of data and not rely on the understanding of physical processes when modeling [26]. Although the fused snow depth dataset performs well in accuracy assessment via five independent in situ observations, there are still some limitations that warrant further improvements. First and foremost, these snow depth datasets were not comprehensively evaluated before data fusion. According to the accuracy assessment in the previous study [9], the NHSD, AMSR-E, GlobSnow, ERA-Interim, and MERRA-2 gridded snow depth datasets were selected for direct fusion. Secondly, the input variables in this paper include three geolocation and topographic factors and five gridded snow depth datasets. Machine learning was applied in the Northern Hemisphere to fuse the snow depth datasets. Although many in situ observations were used to train and validate models, the location of these stations was still sparse. This algorithm can be applied to obtain a high precision snow depth dataset in regions with more dense observation sites. The model should be modified before applying it to specific regions and the appropriate input variables should be selected according to the regional conditions. Additionally, the input variables should be consistent between model training and prediction phase. Thirdly, the fused snow depth was only validated by five independent in situ observations; more in situ stations should be introduced to more thoroughly assess this snow depth dataset. Lastly, we only compared the machine learning models based on the same input samples. The RFR model had a high R 2 , and lower RMSE and MAE, indicating that the RFR model was more advanced than ANN and SVR. The different land cover type and seasons were not considered. In future work, samples of the different land cover types and seasons should be statistically analyzed before being input into the machine learning model ( Table 2).

Conclusions
This study examined three machine learning algorithms (ANN, SVR, and RFR) to fuse the snow depth datasets over the Northern Hemisphere. By comparing the performance of three machine learning methods in 36 models, the models with higher R 2 , smaller RMSE and MAE were selected to fuse the snow depth dataset. The fused snow depth dataset has a high accuracy compared with other snow depth datasets. The main conclusions are: (1) Comparing the performance of the SVR, ANN, and RFR algorithms in 36 models, the RFR algorithm has a higher R 2 , smaller RMSE and MAE. (2) The fused dataset based on the RFR model performed better in winter and spring than autumn because there were more training samples in winter and spring; the average snow depth values in winter and spring were deeper than in autumn.
(3) Comparing AMSR-E, NHSD, GlobSnow, MERRA-2, ERA-Interim, and the new fused snow depth datasets with in situ observation snow depths, the result shows that the original five snow-depth datasets have weak correlations with the observed snow depth. The best coefficient of determination between the five original snow depth products and the observations was 0.15 (i.e., the coefficient of determination between GlobSnow and in situ observations), while the value of the fused snow depth increased to 0.91. The spatial pattern of BIAS between fused dataset and observations indicates that the fused dataset performs very well. The comparison of the fused snow depth product with five independent in situ snow observation sites shows that it is the most accurate. However, in some complex situations with deeper snow depths (>200 cm), like in alpine regions and mixed pixel areas, the fused snow depth also does not perform well.
This paper proposed a new data fusion method that was applied to derive a fused snow depth product across the Northern Hemisphere from 2002 to 2011. There is a slight drawback to the fused snow depth dataset, mainly regarding its spatial coverage. The spatial coverage of the GlobSnow is from 35b • N to 85 • N from 2002 to 2011 so that the spatial coverage of the fused dataset does not cover all the North Hemisphere. In future work, other snow depth datasets (i.e., AMSR-E, NHSD, MERRA2, and ERA-Interim) should be used to fill the missing regions and to produce a fused snow depth product of the Northern Hemisphere by using the random forest method.
Appendix A Table A1. Land cover type reclassification of the GlobCover2009 classification system.

Value
Original Class Reclassification Type