Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere

Hu, Yanxing; Che, Tao; Dai, Liyun; Xiao, Lin

doi:10.3390/rs13071250

Open AccessArticle

Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere

¹

Key Laboratory of Remote Sensing of Gansu Province, Heihe Remote Sensing Experimental Research Station, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Science, Lanzhou 730000, China

²

CAS Center for Excellence in Tibetan Plateau Earth Sciences, Chinese Academy of Sciences, Beijing 100049, China

³

College of Resources and Environment, University of Chinese Academy of Science, Beijing 100049, China

⁴

National Forestry and Grassland Administration Key Laboratory of Forest Resource Conservation and Ecological Safety on the Upper Reaches of the Yangtze River, Sichuan Province Key Laboratory of Ecological Forestry Engineering on the Upper Reaches of the Yangtze River, College of Forestry, Sichuan Agricultural University, Chengdu 611130, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(7), 1250; https://doi.org/10.3390/rs13071250

Submission received: 10 February 2021 / Revised: 15 March 2021 / Accepted: 22 March 2021 / Published: 25 March 2021

(This article belongs to the Special Issue Fusion of High-Level Remote Sensing Products)

Download

Browse Figures

Versions Notes

Abstract

:

In this study, a machine learning algorithm was introduced to fuse gridded snow depth datasets. The input variables of the machine learning method included geolocation (latitude and longitude), topographic data (elevation), gridded snow depth datasets and in situ observations. A total of 29,565 in situ observations were used to train and optimize the machine learning algorithm. A total of five gridded snow depth datasets—Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) snow depth, Global Snow Monitoring for Climate Research (GlobSnow) snow depth, Long time series of daily snow depth over the Northern Hemisphere (NHSD) snow depth, ERA-Interim snow depth and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) snow depth—were used as input variables. The first three snow depth datasets are retrieved from passive microwave brightness temperature or assimilation with in situ observations, while the last two are snow depth datasets obtained from meteorological reanalysis data with a land surface model and data assimilation system. Then, three machine learning methods, i.e., Artificial Neural Networks (ANN), Support Vector Regression (SVR), and Random Forest Regression (RFR), were used to produce a fused snow depth dataset from 2002 to 2004. The RFR model performed best and was thus used to produce a new snow depth product from the fusion of the five snow depth datasets and auxiliary data over the Northern Hemisphere from 2002 to 2011. The fused snow-depth product was verified at five well-known snow observation sites. The R² of Sodankylä, Old Aspen, and Reynolds Mountains East were 0.88, 0.69, and 0.63, respectively. At the Swamp Angel Study Plot and Weissfluhjoch observation sites, which have an average snow depth exceeding 200 cm, the fused snow depth did not perform well. The spatial patterns of the average snow depth were analyzed seasonally, and the average snow depths of autumn, winter, and spring were 5.7, 25.8, and 21.5 cm, respectively. In the future, random forest regression will be used to produce a long time series of a fused snow depth dataset over the Northern Hemisphere or other specific regions.

Keywords:

snow depth datasets; data fusion; machine learning algorithms; the Northern Hemisphere

Graphical Abstract

1. Introduction

Snow cover is a fundamental component of the global energy and water cycles [1,2]. The extent and duration of the Northern Hemisphere snow cover have been substantially reduced as a result of the warming of surface temperatures [3]. Snow depth is an even more crucial parameter than snow cover area for climate studies, hydrological applications, weather forecasts, and disaster assessments [4,5,6,7,8]. However, reliable quantitative information on seasonal snow depth or snow water equivalent (SWE) and their trends are lacking [9,10,11,12]. Currently available hemispheric snow depth gridded products include datasets derived from microwave remote sensing brightness temperature, model simulations or data assimilation, and reanalysis [10,11]. The ability of current methods and products to give accurate snow depth estimates is limited by a number of topographic and climatic factors [9,12,13]. Previous studies have assessed these snow depth datasets over the Northern Hemisphere and regional scales [14,15,16,17,18]. These assessments indicated that remotely-sensed snow depth agrees better with ground observations in shallow snow conditions (0–10 cm) [9,12,13,19,20,21]. Likewise, reanalysis datasets are susceptible to biases from various structural limitations (e.g., elevation biases tied to spatial resolution) and uncertainties in the climate mean state [2,8]. The general spatial resolution of the reanalysis snow depth datasets is about 1°, which is too coarse to be used in hydrological and ecological simulations [22,23,24]. Mudryk et al., [11] compared various gridded products across the Northern Hemisphere and observed large spreads in SWE with magnitudes on the order of the SWE interannual variability, with a relative uncertainty of approximately 50% in the climatological hemispheric peak snow mass and even higher uncertainties in mountain regions. The five gridded snow depth products over the Northern Hemisphere were evaluated. Global Snow Monitoring for Climate Research (GlobSnow) and ERA-Interim exhibited overall better agreements with ground observations than other datasets. Remarkable difference was discovered during the assessment. Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) and AMSR2 agreed better with in situ observations in shallow snow conditions (0–10 cm), while Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) performed better when snow depth exceeded 50 cm [9]. Mortimer et al. [10] used a seven-dataset ensemble algorithm. Their results showed that the ensemble dataset reduced the root mean square error (RMSE) by 10 mm (20%) and increased the correlation from 0.67 to 0.78 compared to any individual product. Wang et al. [25] developed a multifactor power snow depth downscaling model and significantly improved the accuracy compared with the AMSR-2 snow depth product and others in the Tibetan Plateau. The RMSE and mean absolute error (MAE) of this downscaled product were greatly reduced to 2.00 and 0.25 cm, respectively. Zhu et al. [26] used a backpropagation neural network algorithm to downscale snow depth based on microwave, optical remote sensing data, and ground observations in Northern Xinjiang (NX), China. The downscaled snow depth dataset with a spatial resolution of 500 m had the lowest RMSE and MAE (8.16 and 4.73 cm, respectively) among other datasets in the NX.

Machine learning methods have become an important tool in environmental remote sensing since the 1990s and eventually spread to many application areas [27,28,29]. Machine learning was first applied to snow depth retrieval by Tedesco [30], which used K– (~19 GHz) and Ka–band (~37 GHz) vertical and horizontal brightness temperatures as input variables, while the national operational snow observations were the target data. Tedesco et al. [30] compared snow depth retrieved from an Artificial Neural Network (ANN) to values from the spectral polarization difference (SPD) algorithm [31] and the Chang algorithm [32]. The results indicated that ANN trained with observations outperformed other methods. Later, Cao et al. [33] combined ANN with Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) brightness temperature to retrieve snow depth over the Tibetan Plateau. The results indicated that ANN had the ability to derive a more precise retrieval output in complex terrain areas. Snauffer et al. [8] employed ANN combined with the six gridded SWE products in British Columbia, Canada, to derive a new SWE product. This new product performed better than the individual products or the mean of these products. More recently, new advanced machine learning methods have been developed to retrieve snow depth at regional [34,35] or hemispherical scales [36,37]. Liang et al. [35] employed the support vector machine (SVM) method to retrieve snow depth over northern Xinjiang with visible and infrared surface reflectance, brightness temperature and auxiliary data. The performance of SVM also outperformed the SPD method [31], the Che algorithm in China [38] and ANN in Finland [30]. The results also revealed that more input variables could improve precision. Xiao et al. [37] found that the SVM method performed well in snow depth retrieval from passive microwave brightness temperature, and auxiliary data and thus used it to generate a long time series of snow depth for the Northern Hemisphere [36]. Yang et al. [34] first used Random Forest (RF) to derive a long time series of a snow depth product that was more precise than the Che algorithm output [38]. RF was the most effective at reducing bias in SNOw Data Assimilation System (SNODAS) SWE in Ontario, Canada, with an absolute mean bias of 0.2 mm and RMSE of 3.64 mm when compared with in situ observations [39].

These papers demonstrated the potential of machine learning methods to produce more accurate snow depth estimates, but they did not incorporate existing snow depth products directly over the Northern Hemisphere. The existing snow depth datasets, which were produced via passive microwave brightness temperature and in situ observations or reanalysis data, are based on complex physics theory and production processes. Although these datasets have individual advantages, there is a strong need to fuse them into a new product that will incorporate their original characteristics. Regional climate models can provide higher-resolution snow depth information for specific regions. However the computational cost related to complex atmospheric physics schemes limits the production of a product with a long time series for the entire Northern Hemisphere [40]. The statistical downscaling snow depth is also appropriate only for specific small areas [25]. Previous studies have demonstrated the potential for using multiple snow depth products ensembles to improve the accuracy of snow depth datasets [8,10] and constrain uncertainty [9,11]. At present, machine learning provides a suitable approach to fuse snow depth datasets over large scales.

The objectives of this study include two aspects: (1) to test the performances of different machine learning methods on the snow depth data fusion, and (2) to produce high-quality snow depth data by using a suitable machine learning method based on five gridded snow depth datasets and in situ observations over the Northern Hemisphere. Section 1 presents the research background and significance. Section 2 introduces the datasets and methodologies. Section 3 compares the three machine learning methods and the fused snow depth dataset validation by independent in situ observations. Section 4 presents a discussion of the effect of the different input elements and compares it with the results of previous studies, and Section 5 summarizes this work.

2. Data and Methods

2.1. Data

The snow depth datasets used in this study include three types: remote sensing snow depth datasets (i.e., AMSR-E, NHSD and GlobSnow), reanalysis snow depth datasets (i.e., MERRA-2 and ERA-Interim) and ground-based observations. The three remote sensing snow depth datasets and two reanalysis snow depth datasets were considered as independent variables (Table 1). Auxiliary data mainly include land cover data and topographical data.

2.1.1. Remote Sensing Snow Depth Datasets

(1): AMSR-E Snow Depth Dataset

The Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) is a passive microwave sensor onboard the Aqua satellite. The AMSR-E daily remote sensing snow depth dataset was acquired from the Japanese Aerospace Exploration Agency (JAXA, https://suzaku.eorc.jaxa.jp/, accessed on 9 February 2021). This dataset used an improved Chang algorithm that takes into account the forest fraction [41]. If the snow was considered shallow based on the passive microwave brightness temperature threshold detected, snow depth was set to 5 cm. When snow depth was deemed deep, the improved Chang algorithm was applied to retrieve the snow depth. This study used the AMSR-E daily snow depth dataset with a spatial resolution of 0.25°. This dataset does not fully cover the entire Northern Hemisphere; striped gaps southward of 55°N can be found in the daily images. We used the adjacent two days to complete the dataset before the fusion process. AMSR-E was launched in 2002 and discontinued in 2011. For this study we selected all the available data from 2002 to 2011.

(2): GlobSnow Snow Depth Dataset

Global Snow Monitoring for Climate Research (GlobSnow) is a Northern Hemispheric SWE dataset from the European Space Agency (ESA, http://www.globsnow.info/swe/, accessed on 9 February 2021) that was based on the assimilation of satellite microwave radiometer data and weather station data [7]. This method assimilates the daily in situ snow depth into the Helsinki University of Technology (HUT) snow microwave emission model to improve the simulation accuracy, obtaining more accurate snow parameters. The spatial coverage of GlobSnow SWE is 35°N~85°N, with an original spatial resolution of 25 km × 25 km. In this study, the GlobSnow SWE product was transformed into snow depth by dividing it by a constant snow density of 0.24 g/cm³. The snow depth dataset was resampled to a spatial resolution of 0.25° × 0.25° to match the other datasets. Some days for the month of September were not calculated as the data were missing in the GlobSnow SWE product. The spatial coverage of all snow depth datasets was limited to 35°N~85°N, matching the spatial coverage of the GlobSnow dataset.

(3): NHSD Snow Depth Dataset

Long time series of daily snow depth over the Northern Hemisphere (NHSD) [38,42] are available from the national Tibetan plateau data center (TPDC, https://poles.tpdc.ac.cn/, accessed on 9 February 2021). The dataset was produced based on multiple-sensor passive microwave brightness temperature data using a modified Chang algorithm [32]. This is a dynamic algorithm that was developed at a pixel scale based on in situ snow depth observations. For every available pixel, a linear equation between in situ snow depth and the brightness temperature gradient in each month was built. The coefficients of these equations were interpolated to all pixels in the Northern Hemisphere. In forested areas, the forest cover fraction was used to decrease the influence of forests. Besides, to improve the temporal consistency of the long time series of brightness temperatures, an inter-sensor calibration was performed between neighboring sensors.

2.1.2. Reanalysis Snow Depth Datasets

(1): ERA-Interim Snow Depth Dataset

ERA-Interim [22] is the fourth generation of reanalysis data from the European Center for Medium-Range Weather Forecasts (ECMWF, https://apps.ecmwf.int/datasets/data/interim-full-daily/, accessed on 9 February 2021). The snow-related parameters of ERA-Interim are derived from the hydrology tiled ECMWF schemes (TESSEL) for surface exchange over land. In this study, snow depth was calculated from snow density and SWE. The SWE and snow density datasets were downloaded with a resampled spatial resolution of 0.25° × 0.25° and temporal resolution of 6 h. First, the average 6-hourly snow depths were calculated from the SWE and snow density; then, these snow depths were averaged to daily data.

(2): MERRA-2 Snow Depth Dataset

The Modern Era Retrospective Analysis for Research and Applications [43], Version 2 (MERRA-2) is produced by the Global Modeling and Assimilation Office of NASA (GMAO, https://disc.gsfc.nasa.gov/datasets/, accessed on 9 February 2021). MERRA-2 offers several atmospheric and surface key variables on a global scale. The land surface model of Catchment [43] was applied in MERRA-2. Based on the average snow depth of the snow-covered area in a pixel and snow cover fraction, we derived the mean snow depth of the pixel. The original spatial resolution of this product was 0.5° × 0.625°. We resampled its spatial resolution to 0.25° × 0.25° by nearest interpolation in order to match the other datasets.

2.1.3. Ground-Based Measurement

Ground-based observations were used to construct and validate the machine learning snow depth fusion models. There are four sources of ground observations, including the meteorological station data from China and Russia, snow survey data from Russia, and the Global Historical Climatology Network (GHCN) daily dataset. In this study, we selected the observations available for the period from 2002 to 2011.

Daily snow depth of China was collected from the national meteorological information center of the Chinese Meteorological Administration (http://data.cma.cn/, accessed on 9 February 2021), with 923 stations used in this study. This dataset offers daily snow depth, location, and elevation of the station. Daily snow depth is manually observed at 8:00 a.m. with a ruler. These data were calibrated and quality checked before they were released on the national meteorological data platform.

Daily snow depth from Russia from 2002 to 2011 was derived from the Russian meteorological center (http://aisori.meteo.ru/ClimateR, accessed on 9 February 2021). Snow depth, location and elevation of the station, and the quality control field were obtained from the dataset. Anomalous records were marked out in this dataset during the quality check. After screening, there 576 stations remained in this dataset.

The snow survey data of Russia were also obtained through the Russian meteorological center (http://aisori.meteo.ru/, accessed on 9 February 2021). This field survey dataset contains the snow depth (i.e., deepest snow depth, shallowest snow depth, and average snow depth), snow density every 5 to 10 days from September to May. Snow depth larger than the deepest snow depth or less than the lowest snow depth was regarded as anomalous and eliminated in this study. Finally, 514 efficient stations remained for this study.

The Global Historical Climatology Network (GHCN) daily dataset (https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C00861, accessed on 9 February 2021) provides snow depth data and elevation. Data in this dataset were filtered according to the quality control field. Data that failed in internal consistency check, climatological outlier check, spatial or temporal check, etc., were marked out in the quality control field and removed. Finally, 27,552 stations remained for this study.

2.1.4. Auxiliary Data

(1): Reclassification of Land Cover Data

The land cover data used in this study were from GlobCover 2009 (https://due.esrin.esa.int/page_globcover.php, accessed on 9 February 2021), which was produced by ESA and the Université Catholique de Louvain (UCL). GlobCover2009 includes 23 class types according to the United Nations Land Cover Classification System (LCCS) (Table A1, Appendix A). The land cover types of GlobCover 2009 were reclassified into five classes, forest, shrub, prairie, bare land, and unclassified. When executing the machine learning snow depth fusion algorithm, the type unclassified was excluded from the calculation. The original spatial resolution of GlobCover is 300 m × 300 m. In order to match the spatial resolution of the snow-depth datasets, the data were resampled into a grid of 0.25° × 0.25° and the land type covering the largest proportion in a grid was assumed as the true land cover. We reclassified the original GlobCover2009 into five classes (Figure 1; Table A1, Appendix A), consistently with previous studies [37]. Snow depth fusion models were established in the area of forest, grassland, shrub, and bare-land, respectively.

(2): Topographic Data

Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) archieved in, https://topotools.cr.usgs.gov/gmted_viewer/ (accessed on 9 February 2021), is an update of GTOPO30, and is produced by the United States Geological Survey (USGS). GMTED offers three spatial resolutions: 30 arc-seconds, 15 arc-seconds, and 7.5 arc-seconds [44]. In this study, the data with a spatial resolution of 30 arc-seconds were resampled into the grid of 0.25° × 0.25° used for snow depth fusion.

2.2. Methodology and Experimental Design

In this paper, three widely used machine-learning methods (i.e., Random Forest Regression (RFR), Support Vector Regression (SVR), and ANN) were adopted. This section provides a general description of the three machine learning methods, experimental design and assessment index of model performance.

2.2.1. Machine Learning Methods

ANN is typically composed of interconnected neuronal units organized in layers and can be used in problems of classification and regression [45]. In this study, we applied the backpropagation artificial neural network (BP-ANN). Generally, the BP-ANN model has three layers, namely the input layer, hidden layer and output layer. The input variables were propagated from the input layer to the output layer through the hidden layer, while the error was transmitted in the opposite direction, thereby correcting the connection weight of the network [46]. A neural network structure consists of a transfer function, a learning algorithm, many hidden layers, training and predicting datasets [47]. In this work, the transfer functions tan-sigmoid and purelin were applied from the input layer to the hidden layer and from the hidden layer to the output layer, respectively. A combination of a gradient descent method and the Gauss-Newton method was adopted as the learning algorithm.

Support Vector Regression (SVR) is a supervised learning algorithm for regression [48,49]. SVR relies on establishing a regression function, and SVR is a statistical learning theory-based machine learning formalism. In the SVR model, the input variables will be first mapped into a high-dimensional feature space using a kernel function, either linear or non-linear depending on the relationship between the dependent and independent variables. Then, a linear model is constructed in the feature space to balance between minimizing errors and overfitting [50]. SVR is gaining popularity because of its many attractive features and promising generalization performance. SVR considers an input vector and the number of geophysical variables at a given location in space and time. Selecting a suitable kernel function is very important in this method. In this study, the radial kernel function was chosen for model training and prediction.

Random Forest Regression (RFR) is an ensemble learning technique that combines a large set of decision trees for regression, and each regression tree is independent of others [51]. Several randomized decision trees aggregate their predictions via regression [52]. The RFR generally only requires two user-defined parameters, the number of trees in the ensemble, and the number of random variables at each tree. The RFR model has been widely used in remote sensing information extraction because of its high flexibility and precision. As RFR compensates the bias brought by a single decision tree through the randomness, RFR does not easily over-fit and has extremely high accuracy and fast training speed; thus, it is suitable for dealing with big data. In this paper, we used the randomForest R package on the cloud platform supported by the Big Earth Data Science Engineering Project of Chinese Academy of Science (CASEarth) (http://workbench.casearth.cn/, accessed on 9 February 2021).

2.2.2. Experimental Design

Based on previous assessments, the performance of different snow depth products shows inconsistencies in different seasons and landcover types [11,15,17]. A complete snow cover year was defined as the period between September of the previous year (t-1) and May of the current year (t). Additionally, a complete snow year was divided into three snow-seasons [16,53]: autumn (September to November), winter (December to February) and spring (March to May). In this study, seasonal information and land cover types were used as a priori conditions to form 12 models. In total, three machine learning algorithms were applied in this study, thus extending these 12 models to 36 models (Figure 2).

In these models, the input variables included AMSR-E, NHSD, GlobSnow, MERRA-2, ERA-Interim, latitude, longitude, and elevation. In the phase of model training, the independent input variables were Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, and MERRA-2 snow depth datasets, and the dependent variable was the set of in situ observations. In the phase of the model prediction, the input variables were Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, and MERRA-2 snow depth datasets; the dependent variable was the predicted snow depth. Because of the large numbers of samples in an entire hydrological year (Table 2), the comparison of the three different machine learning methods was based only on the snow hydrological years from 2002 to 2004, in consideration of the computing cost. Training samples were extracted from 1 September 2002 to 31 May 2003, and the predicted samples were received from 1 September 2003 to 31 May 2004. In different land cover types and seasons, the model parameter was confirmed according to the number of training samples.

The selected datasets were used for model training and prediction. All the input variables and in situ observations were normalized to have a mean of 0 and a standard deviation of 1 [54]. The selected samples were divided into two parts, 80% of the samples were used for training the model, and the rest 20% were used for the model prediction (20%).

These 36 fused snow-depth models were evaluated by the coefficient of determination (R²), root mean square error (RMSE) and mean absolute error (MAE). We also calculated the bias of in situ observations and fused values (BIAS) to evaluate the spatial error between the fusion dataset and observations:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(S i - \hat{S} i)}^{2}}{\sum_{i = 1}^{n} {(S i - \bar{S})}^{2}},

(1)

R M S E = \sqrt{\frac{{\sum_{i = 1}^{n} (S i - \hat{S} i)}^{2}}{n}},

(2)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |S i - \hat{S} i|,

(3)

B I A S = \hat{S} i - S i,

(4)

where n is the number of sample pixels,

S i

and

\hat{S} i

denote the in situ observation and fused snow depth values of the i-th pixel, respectively.

\bar{S}

represents the mean value of in situ observations of n pixels.

The optimal machine learning method was selected to fuse the snow depth dataset. A “leave-one-year-out” cross-validation for each divided dataset was conducted to determine the performance of the optimal method in continuous time series estimation of snow depth. Finally, a time series comprehensive snow depth dataset covering the Northern Hemisphere was derived from 2002 to 2011.

3. Results

3.1. Comparison among the Fused Snow Depth from Three Machine Learning Methods

In this study, we applied three machine learning models and four land cover classes to design the 12 models. We also divided the whole hydrological year into three seasons (autumn (September to November), winter (December to February), and spring (March to May)) to derive 36 pairs of accuracy assessment indices. The results of these 36 snow-depth fusion models are presented in Table 2. In the model comparison phase, the input variables for RFR, SVR, and ANN were the same. In the same season and same land cover type, the RFR model had a higher R², and lower RMSE and MAE, indicating that the RFR model was preferable over ANN and SVR. Especially in March to May, RMSE and MAE’s values decreased substantially compared to those of ANN and SVR. The calculation of RFR was also more efficient than that of ANN and SVR. Therefore, the RFR model was used to produce the fused snow depth data over the Northern Hemisphere from 2002 to 2011.

3.2. Comparison between the Fused Dataset and Five Other Snow Depth Datasets Based on Observations

The fused snow depth dataset shows better R², RMSE, and MAE than the five original snow-depth datasets. The fused results indicate that the original five snow-depth datasets have weak correlations with the observed snow-depths. The RFR fusion, therefore, significantly improves the accuracy of the snow-depth datasets. The R² increases to 0.91, the RMSE and MAE decrease to 5.5 and 2.2 cm, respectively (Figure 3). Based on the accuracy assessment, we found that the snow depth dataset fused with the RFR algorithm is very consistent with the station observations.

The fused snow depth values are distributed near the 1:1 line (Figure 4).

This result indicates that most pixels have snow-depths of less than 50.0 cm, and many pixels have a very high fusion accuracy. The spatial distribution of the BIAS shows that 89.85% of the pixels (9293 pixels out of a total of 10,343) have a BIAS between −5.0 and 5.0 cm (Figure 5).

The fused results also show that only very few pixels have a BIAS value greater than 5.0 cm, or less than −5.0 cm. The spatial pattern of the BIAS (Figure 6) exhibits a good consistency with station observations over the Northern Hemisphere, especially at low latitudes.

3.3. Accuracy Assessment of the Fused Snow Depth Dataset at Five Independent In Situ Snow Observation Sites

To further verify the accuracy of the fused snow depth dataset, five independent snow sites recommended by the Earth System Model-Snow Model Intercomparison Project (ESM-SMIP) were selected for validation. The detailed information of these five sites is in Table A2, Appendix A. These sites have in situ snow depth and SWE data. The series of snow depth data were extracted from the fused snow dataset and observations using overlapping time series (Figure 7).

From the accuracy assessment of these five sites, Sodankylä(SOD) [55] performed best with R², RMSE and BIAS of 0.88, 8.6 and 4.0 cm, respectively (Figure 7a). Although the accuracies of the Old Aspen(OAS) [56] and Reynolds Mountain East(RME) [57] are not as good as those of Sodankylä, they are still within the accepted scope. Swamp Angel Study Plot (SASP) [58] and Weissfluhjoch(WFJ) [59] sites, which have deeper snow depth values, do not have a good performance; the fused snow depth shows a prominent underestimation. The five original gridded snow depth products (Figure A1, Appendix A) all have a distinct underestimation, indicating that the input variables of the RFR model are very important to the fused results. The geographical location and topographic conditions are complex in SASP. SASP is in a sheltered area and is surrounded by a sub-alpine forest. Therefore, it experiences lower winds and is better suited for measuring precipitation. Under these conditions, while deeper snow develops in SASP, this also leads to a distinct underestimation of snow depth based on the remote sensor. The WFJ site is located in an almost flat part of a southeasterly oriented slope and at an altitude of about 2540 m [59]. During the winter months, deeper snow builds up at this altitude. In future work, the RFR modeled dataset should be improved based on this validation.

3.4. The Spatial Distribution of the Fused Snow Depth Dataset Based on Random Forest Regression

Based on the daily fused snow depth dataset, we derived the monthly average, seasonal average, and yearly average snow depth. The multi-year average snow depth and different seasonal average snow depth were also calculated.

The spatial distribution of the fused daily snow depth product over the Northern Hemisphere from 2002 to 2011 is shown in Figure 8a. The spatial pattern of the multi-year average snow depth over the Northern Hemisphere indicates that the regions with deep snow are distributed in the Northern American and Siberian plain. This spatial pattern also reveals that most parts of the mid and low latitude regions (<50°N) have a shallower snow depth of less than 5.0 cm. The snow depth in China is relatively shallow (<5.0 cm). The fused snow-depth in autumn (Figure 8b) is smaller than in winter (Figure 8c) and spring (Figure 8d). The average snow depths of autumn, winter and spring are 5.7, 25.8, and 21.5 cm, respectively. In autumn, snow depth varies from 0 to 66.4 cm, and snow depths of most regions are less than 20.3 cm. In winter, most of the regions have deeper snow, which varies from 0 to 258.0 cm. In spring, the spatial pattern of snow depth is similar to that in winter. Because the new algorithm incorporates the advantage of remote sensing and reanalysis products, the range of the fused snow depth varies from 0 to more than 200 cm.

4. Discussion

4.1. The Effect of Seasons on the Fused Snow Depth Dataset

In the current study, the snow year was divided into three seasons for the machine learning training and predicting phases, which was consistent with previous studies [16,37,53]. To compare the performance of different machine learning methods in the fusion of the snow depth datasets in different seasons and land cover types, the three seasons were assessed independently. In autumn, the snow depth is shallow (<5.0 cm), so the fused results perform much worse than in winter and spring (Table 2). In autumn, while the absolute values of RMSE and MAE were smaller given the shallower snow depth, the R² had the worst performance. Overall, the machine learning fusion algorithms performed better in deeper snow (>10.0 cm). Currently, the fused snow depth dataset based on the machine learning algorithm has higher accuracy in winter (December to February) and spring (March to May) than in autumn (September to November). The average snow depth over the Northern Hemisphere from 2002 to 2011 varies from 0.7 to 34.6 cm (Figure 9).

In winter and spring, the average snow depth is deeper and stable, which may reduce some uncertainties, so the fused snow depth performs well in these two seasons. In autumn, the snow depth is shallow and increases slowly. The snow depth retrieval scheme based on remote sensing microwave brightness temperature is based on a set threshold (e.g., 5.0 cm in GlobSnow). This threshold also leads to some uncertainties. The accuracy assessment indicates that when the snow depth exceeds about 10.0 cm, the fused snow depth performs better (Figure 7).

4.2. Improvement between the Current Study and Previous Work

Previous studies retrieved snow depth employed machine learning algorithms combining brightness temperature with other auxiliary data. The applications of machine learning models have improved the estimation accuracy of snow depth [37] and SWE [60] over the Northern Hemisphere. Tedesco and Jeyaratnam [60] derived a densification formula using Bayesian statistics for SWE estimations from passive microwave brightness temperature observations based on in situ snow depth, density, and SWE for each snow climate class. NASA’s current V2.0 AMSR-E SWE algorithm utilizes an artificial neural network, snow emission modeling, and climatological snow depth data for the estimation of snow depth and the detection of dry versus wet snow conditions [60].

Our study differed mainly with regards to the selected input variables. Xiao et al. [37] employed SVM using passive microwave brightness temperatures and auxiliary data to derive long time series of daily snow depth over the Northern Hemisphere. The evaluation of this snow depth product with the other two snow cover products (GlobSnow and ERA-Interim/Land) showed that it performs comparably well with relatively high accuracy [36]. Snauffer et al. [8] used ANN combined with some gridded snow datasets to derive a comprehensive SWE product. In this study, five snow depth datasets were introduced to integrate a new snow depth product based on machine learning methods.

As described in the aforementioned studies, using machine learning methods and brightness temperature can improve the accuracy of the snow depth inversion. Brightness temperature data of some sensors have striped gaps resulting in missing data in some areas [41]. However, the snow depth product provides continuous coverage and avoids missing data. In the current study, the fused accuracy improved because the input variables were already snow depth products.

4.3. Determining the Input Parameters of the RFR Model

In the experimental design, all gridded snow depth products and auxiliary data were considered as the input parameters for machine learning methods (Section 2.2.2). To test the importance of these input variables for the model, a scheme (Table 3) was designed to verify the result.

As described in Table 3, one input variable was deleted in every scheme, and then the importance of that variable was evaluated. In the designed schemes, when deleting some variables, the accuracy of the fused snow depth makes a difference, but these changes are not significant based on the RMSE and MAE (Table 4).

Although the ERA-Interim model occupied the second rank in terms of accuracy performance, it was excluded. We can’t conclude that the ERA-Interim was not important for the model. Accuracy assessments of gridded snow depth datasets indicated that ERA-Interim exhibits overall better agreements with in situ observations than other datasets [9]. The rank of our input snow depth datasets did not indicate that this variable was important or nonsignificant for the model. The model was best when all variables were inputted into the fused model. Therefore, we selected as many gridded snow depth products as possible to pursue the most accurate results.

The result of Table 4 used the training samples of 2002–2003, and the model prediction phase applied the samples of 2003–2004. The land cover type was Bare-land, and the time period was December to February. “All variables” indicates that all variables were used as input elements, “Elevation excluded” means that the input variables did not contain Elevation, etc. By deleting the input variables in the RFR model one by one, the results showed that the RFR model is a more stable machine learning model and that it could be used to produce a long time series of snow depth product.

4.4. Limitations of the Current Study

As described in previous relevant papers, machine learning methods could overcome various complex problems existing in large-scale retrievals [27,28,29]. Machine-learning methods can learn and summarize a large number of data and not rely on the understanding of physical processes when modeling [26]. Although the fused snow depth dataset performs well in accuracy assessment via five independent in situ observations, there are still some limitations that warrant further improvements. First and foremost, these snow depth datasets were not comprehensively evaluated before data fusion. According to the accuracy assessment in the previous study [9], the NHSD, AMSR-E, GlobSnow, ERA-Interim, and MERRA-2 gridded snow depth datasets were selected for direct fusion. Secondly, the input variables in this paper include three geolocation and topographic factors and five gridded snow depth datasets. Machine learning was applied in the Northern Hemisphere to fuse the snow depth datasets. Although many in situ observations were used to train and validate models, the location of these stations was still sparse. This algorithm can be applied to obtain a high precision snow depth dataset in regions with more dense observation sites. The model should be modified before applying it to specific regions and the appropriate input variables should be selected according to the regional conditions. Additionally, the input variables should be consistent between model training and prediction phase. Thirdly, the fused snow depth was only validated by five independent in situ observations; more in situ stations should be introduced to more thoroughly assess this snow depth dataset. Lastly, we only compared the machine learning models based on the same input samples. The RFR model had a high R², and lower RMSE and MAE, indicating that the RFR model was more advanced than ANN and SVR. The different land cover type and seasons were not considered. In future work, samples of the different land cover types and seasons should be statistically analyzed before being input into the machine learning model (Table 2).

5. Conclusions

This study examined three machine learning algorithms (ANN, SVR, and RFR) to fuse the snow depth datasets over the Northern Hemisphere. By comparing the performance of three machine learning methods in 36 models, the models with higher R², smaller RMSE and MAE were selected to fuse the snow depth dataset. The fused snow depth dataset has a high accuracy compared with other snow depth datasets. The main conclusions are:

(1): Comparing the performance of the SVR, ANN, and RFR algorithms in 36 models, the RFR algorithm has a higher R², smaller RMSE and MAE.
(2): The fused dataset based on the RFR model performed better in winter and spring than autumn because there were more training samples in winter and spring; the average snow depth values in winter and spring were deeper than in autumn.
(3): Comparing AMSR-E, NHSD, GlobSnow, MERRA-2, ERA-Interim, and the new fused snow depth datasets with in situ observation snow depths, the result shows that the original five snow-depth datasets have weak correlations with the observed snow depth. The best coefficient of determination between the five original snow depth products and the observations was 0.15 (i.e., the coefficient of determination between GlobSnow and in situ observations), while the value of the fused snow depth increased to 0.91. The spatial pattern of BIAS between fused dataset and observations indicates that the fused dataset performs very well. The comparison of the fused snow depth product with five independent in situ snow observation sites shows that it is the most accurate. However, in some complex situations with deeper snow depths (>200 cm), like in alpine regions and mixed pixel areas, the fused snow depth also does not perform well.

This paper proposed a new data fusion method that was applied to derive a fused snow depth product across the Northern Hemisphere from 2002 to 2011. There is a slight drawback to the fused snow depth dataset, mainly regarding its spatial coverage. The spatial coverage of the GlobSnow is from 35b°N to 85°N from 2002 to 2011 so that the spatial coverage of the fused dataset does not cover all the North Hemisphere. In future work, other snow depth datasets (i.e., AMSR-E, NHSD, MERRA2, and ERA-Interim) should be used to fill the missing regions and to produce a fused snow depth product of the Northern Hemisphere by using the random forest method.

Author Contributions

T.C. conceived and designed the study. Y.H. and L.X. deal with the data. Y.H. produced the first draft of the manuscript, which was subsequently edited by T.C., L.D., and L.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Science under Grant XDA19070101, the National Nature Science Foundation of China (grant No.41771389 and 42001289) and the Chinese Academy of Science ‘Light of West China’ Program (E029070101).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The accuracy evaluation data are available from https://doi.org/10.1594/PANGAEA.897575 (accessed on 9 February 2021). The AMSR-E snow depth dataset is downloaded from JAXA, NHSD snow depth datasets are provided by TPDC, GlobSnow SWE data are downloaded from ESA, and ERA-Interim and MERRA-2 are downloaded from ECMWF and NASA, respectively. The in situ observations are obtained from the meteorological administration from China, Russia and GHCN. The fused snow depth dataset is available on request from the corresponding author.

Acknowledgments

The authors thank the editors and two anonymous reviewers for their constructive comments and suggestions for improving this paper. We also want to thank CASEarth for their support during the use of the cloud platform.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Land cover type reclassification of the GlobCover2009 classification system.

Value	Original Class	Reclassification Type
11	Post-flooding or irrigated croplands (or aquatic)	Bare-Land
14	Rainfed croplands	Bare-Land
20	Mosaic cropland (50–70%)/vegetation (grassland/shrubland/forest) (20–50%)	Bare-Land
30	Mosaic vegetation (grassland/shrubland/forest) (50–70%)/cropland (20–50%)	Shrub
40	Closed to open (>15%) broadleaf evergreen or semi-deciduous forest (>5 nm)	Forest
50	Closed (>40%) broadleaf deciduous forest (>5 m)	Forest
60	Open (15–40%) broadleaf deciduous forest/woodland (> 5 m)	Forest
70	Closed (>40%) needleleaf evergreen forest (>5 m)	Forest
90	Open (15–40%) needleleaf deciduous or evergreen forest (>5 m)	Forest
100	Closed to open (>15%) mixed broadleaf and needleleaf forest (>5 m)	Forest
110	Mosaic forest or shrubland (50–70%)/grassland (20–50%)	Shrub
120	Mosaic grassland (50–70%)/forest or shrubland (20–50%)	Grassland (Prairie)
130	Closed to open (>15%) (broadleaf or needleleaf, evergreen or deciduous) shrubland (<5 m)	Shrub
140	Closed to open (>15%) herbaceous vegetation (grassland, savannas or lichens/mosses)	Grassland (Prairie)
150	Sparse (<15%) vegetation	Bare-Land
160	Closed to open (>15%) broadleaf forest regularly flooded (semi-permanently or temporarily)—Fresh or brackish water	Forest
170	Closed (>40%) broadleaf forest or shrubland permanently flooded—Saline or brackish water	Forest
180	Closed to open (>15%) grassland or woody vegetation on regularly flooded or waterlogged soil—Fresh, brackish or saline water	Grassland (Prairie)
190	Artificial surfaces and associated areas (Urban areas > 50%)	Bare-Land
200	Bare areas	Bare-Land
210	Water bodies	Water
220	Permanent snow and ice	Bare-Land
230	No data (burnt areas, clouds…)	Unclassified

Table A2. Data ownership and reference papers for five independent snow observations sites (Modified by Ménard et al. [61]).

Site	Short Name	Latitude (°)	Longitude (°)	Elevation (m)	Data Provider	Vegetation Type	Reference Paper
Sodankylä	SOD	67.416	26.59	179	Finnish Meteorological Institute, Finland	Clearing (short heather and lichen) in coniferous forest	[55]
Old Aspen	OAS	54.05	−106.333	600	Environment and Climate Change Canada, Canada	21 m high aspen forest. Thick understory of 2 m high hazelnut. Winter stem area ∼ 1, summer 3.7–5.2	[56]
Reynolds Mountain East	RME	43.186	−116.783	2060	USDA Agricultural Research Service, USA	Clearing (short grass) in an alpine/fir grove	[57]
Swamp Angel Study Plot(SASP)	SWA	37.907	−107.711	3371	Center for Snow and Avalanche Studies, USA	Clearing (short grass) in subalpine forest	[58]
Weissfluhjoch	WFJ	46.827	9.807	2536	WSL Institute for Snow and Avalanche Research, Switzerland	Barren	[59]

Figure A1. Original gridded snow depth datasets of five independent snow observation sites.

References

Barnett, T.P.; Adam, J.C.; Lettenmaier, D.P. Potential impacts of a warming climate on water availability in snow-dominated regions. Nature 2005, 438, 303–309. [Google Scholar] [CrossRef]
Bormann, K.J.; Brown, R.D.; Derksen, C.; Painter, T.H. Estimating snow-cover trends from space. Nat. Clim. Chang. 2018, 8, 924–928. [Google Scholar] [CrossRef]
Brown, R.D.; Mote, P.W. The Response of Northern Hemisphere Snow Cover to a Changing Climate. J. Clim. 2009, 22, 2124–2145. [Google Scholar] [CrossRef]
Dressler, K.A.; Leavesley, G.H.; Bales, R.C.; Fassnacht, S.R. Evaluation of gridded snow water equivalent and satellite snow cover products for mountain basins in a hydrologic model. Hydrol. Process. 2006, 20, 673–688. [Google Scholar] [CrossRef]
Lievens, H.; Demuzere, M.; Marshall, H.-P.; Reichle, R.H.; Brucker, L.; Brangers, I.; Rosnay, P.d.; Dumont, M.; Girotto, M.; Immerzeel, W.W.; et al. Snow depth variability in the Northern Hemisphere mountains observed from space. Nat. Commun. 2019, 10, 1–12. [Google Scholar] [CrossRef] [PubMed]
Nayak, A.; Marks, D.; Chandler, D.G.; Seyfried, M. Long-term snow, climate, and streamflow trends at the Reynolds Creek Experimental Watershed, Owyhee Mountains, Idaho, United States. Water Resour. Res. 2010, 46, W06519. [Google Scholar] [CrossRef]
Takala, M.; Luojus, K.; Pulliainen, J.; Derksen, C.; Lemmetyinen, J.; Kärnä, J.P.; Koskinen, J.; Bojkov, B. Estimating northern hemisphere snow water equivalent for climate research through assimilation of space-borne radiometer data and ground-based measurements. Remote Sens. Environ. 2011, 115, 3517–3529. [Google Scholar] [CrossRef]
Snauffer, A.M.; Hsieh, W.W.; Cannon, A.J.; Schnorbus, M.A. Improving gridded snow water equivalent products in British Columbia, Canada: Multi-source data fusion by neural network models. Cryosphere 2018, 12, 891–905. [Google Scholar] [CrossRef] [Green Version]
Xiao, L.; Che, T.; Dai, L. Evaluation of Remote Sensing and Reanalysis Snow Depth Datasets over the Northern Hemisphere during 1980–2016. Remote Sens. 2020, 12, 3253. [Google Scholar] [CrossRef]
Mortimer, C.; Mudryk, L.; Derksen, C.; Luojus, K.; Brown, R.; Kelly, R.; Tedesco, M. Evaluation of long term Northern Hemisphere snow water equivalent products. Cryosphere 2020, 12, 1579–1594. [Google Scholar] [CrossRef]
Mudryk, L.; Derksen, C.; Kushner, P.J.; Brown, R. Characterization of Northern Hemisphere Snow Water Equivalent Datasets, 1981–2010. J. Clim. 2015, 28, 8037–8051. [Google Scholar] [CrossRef] [Green Version]
Pulliainen, J.; Luojus, K.; Derksen, C.; Mudryk, L.; Lemmetyinen, J.; Salminen, M.; Ikonen, J.; Takala, M.; Cohen, J.; Smolander, T.; et al. Patterns and trends of Northern Hemisphere snow mass from 1980 to 2018. Nature 2020, 581, 294–298. [Google Scholar] [CrossRef]
Broxton, P.D.; Leeuwen, W.J.D.v.; Biederman, J.A. Improving Snow Water Equivalent Maps With Machine Learning of Snow Survey and Lidar Measurements. Water Resour. Res. 2019, 55, 3739–3757. [Google Scholar] [CrossRef]
Che, T.; Dai, L.; Zheng, X.; Li, X.; Zhao, K. Estimation of snow depth from passive microwave brightness temperature data in forest regions of northeast China. Remote Sens. Environ. 2016, 183, 334–349. [Google Scholar] [CrossRef]
Cho, E.; Tuttle, S.E.; Jacobs, J.M. Evaluating Consistency of Snow Water Equivalent Retrievals from Passive Microwave Sensors over the North Central U. S.: SSM/I vs. SSMIS and AMSR-E vs. AMSR2. Remote Sens. 2017, 9, 465. [Google Scholar]
Larue, F.; Royer, A.; Sève, D.D.; Langlois, A.; Roy, A.; Brucker, L. Validation of GlobSnow-2 snow water equivalent over Eastern Canada. Remote Sens. Environ. 2017, 194, 264–277. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Snauffer, A.M.; Hsieh, W.W.; Cannon, A.J. Comparison of gridded snow water equivalent products with in situ measurements in British Columbia, Canada. J. Hydrol. 2016, 541, 714–726. [Google Scholar] [CrossRef]
Tedesco, M.; Narvekar, P.S. Assessment of the NASA AMSR-E SWE product. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2010, 3, 141–159. [Google Scholar] [CrossRef]
Dozier, J.; Bair, E.H.; Davis, R.E. Estimating the spatial distribution of snow water equivalent in the world’s mountains. Wiley Interdiscip. Rev. Water 2016, 3, 461–474. [Google Scholar] [CrossRef]
Evora, N.D.; Tapsoba, D.; Sève, D.D. Combining Artificial Neural Network Models, Geostatistics, and Passive Microwave Data for Snow Water Equivalent Retrieval and Mapping. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1925–1939. [Google Scholar] [CrossRef]
Viviroli, D.; Du¨rr, H.H.; Messerli, B.; Meybeck, M.; Weingartner, R. Mountains of the world—Water towers for humanity: Typology, mapping, and global significance. Water Resour. Res. 2007, 43, W07447. [Google Scholar] [CrossRef] [Green Version]
Dee, D.P.; Uppala, S.M.; Simmons, A.J.; Berrisford, P.; Poli, P.; Kobayashi, S.; Andrae, U.; Balmaseda, M.A.; Balsamo, G.; Bauer, P.; et al. The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Q. J. R. Meteorol. Soc. 2011, 137, 553–597. [Google Scholar] [CrossRef]
Li, Q.; Yang, T.; Zhang, F.; Qi, Z.; Li, L. Snow depth reconstruction over last century: Trend and distribution in the Tianshan Mountains, China. Glob. Planet. Chang. 2019, 173, 73–82. [Google Scholar] [CrossRef]
Parker, W.S. Reanalyses and Observations: What’s the Difference? Bull. Am. Meteorol. Soc. 2016, 97, 1565–1572. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Huang, X.; Wang, J.; Zhou, M.; Liang, T. AMSR2 snow depth downscaling algorithm based on a multifactor approach over the Tibetan Plateau, China. Remote Sens. Environ. 2019, 231, 111268. [Google Scholar] [CrossRef]
Zhu, L.; Zhang, Y.; Wang, J.; Tian, W.; Liu, Q.; Ma, G.; Kan, X.; Chu, Y. Downscaling Snow Depth Mapping by Fusion of Microwave and Optical Remote-Sensing Data Based on Deep Learning. Remote Sens. 2021, 13, 584. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep learning in environmental remote sensing: Achievements and challenges. Remote Sens. Environ. 2020, 241, 111176. [Google Scholar] [CrossRef]
Zhang, B.; Chen, Z.; Peng, D.; Benediktsson, J.A.; Liu, B.; Zou, L.; Li, J.; Plaza, A. Remotely Sensed Big Data: Evolution in Model Development for Information Extraction. Proc. IEEE. 2019, 107, 2294–2301. [Google Scholar] [CrossRef]
Tedesco, M.; Pulliainen, J.; Takala, M.; Hallikainen, M.; Pampaloni, P. Artificial neural network-based techniques for the retrieval of SWE and snow depth from SSM/I data. Remote Sens. Environ. 2004, 90, 76–85. [Google Scholar] [CrossRef]
Aschbacher, J. Land Surface Studies and Atmospheric Effects by Satellite Microwave Radiometry. Ph.D. Thesis, University of Innsbruck, Innsbruck, Austria, 1989. [Google Scholar]
Chang, A.T.C.; Foster, J.L.; Hall, D.K. Nimbus-7 SMMR Derived Global Snowcover Parameters. Ann. Glaciol. 1987, 9, 39–44. [Google Scholar] [CrossRef] [Green Version]
Cao, Y.; Yang, X.; Zhu, X. Retrieval snow depth by artificial neural network methodology from integrated AMSR-E and in-situ data—A case study in Qinghai-Tibet Plateau. Chinese Geograph. Sci. 2008, 18, 356–360. [Google Scholar] [CrossRef]
Yang, J.; Jiang, L.; Luojus, K.; Pan, J.; Lemmetyinen, J.; Takala, M.; Wu, S. Snow depth estimation and historical data reconstruction over China based on a random forest machine learning approach. Cryosphere 2020, 14, 1763–1778. [Google Scholar] [CrossRef]
Liang, J.; Liu, X.; Huang, K.; Li, X.; Shi, X.; Chen, Y.; Li, J. Improved snow depth retrieval by integrating microwave brightness temperature and visible/infrared reflectance. Remote Sens. Environ. 2015, 156, 500–509. [Google Scholar] [CrossRef]
Xiao, X.; Zhang, T.; Zhong, X.; Li, X. Spatiotemporal Variation of Snow Depth in the Northern Hemisphere from 1992 to 2016. Remote Sens. 2020, 12, 2728. [Google Scholar] [CrossRef]
Xiao, X.; Zhang, T.; Zhong, X.; Shao, W.; Li, X. Support vector regression snow-depth retrieval algorithm using passive microwave remote sensing data. Remote Sens. Environ. 2018, 210, 48–64. [Google Scholar] [CrossRef]
Che, T.; Li, X.; Jin, R.; Armstrong, R.; Zhang, T. Snow depth derived from passive microwave remote-sensing data in China. Ann. Glaciol. 2008, 49, 145–154. [Google Scholar] [CrossRef] [Green Version]
King, F.; Erler, A.R.; Frey, S.K.; Fletcher, C.G. Application of machine learning techniques for regional bias correction of snow water equivalent estimates in Ontario, Canada. Hydrol. Earth Syst. Sci. 2020, 24, 4887–4902. [Google Scholar] [CrossRef]
Wrzesien, M.L.; Durand, M.T.; Pavelsky, T.M.; Kapnick, S.B.; Zhang, Y.; Guo, J.; Shum, C.K. A New Estimate of North American Mountain Snow Accumulation From Regional Climate Model Simulations. Geophys. Res. Lett. 2018, 45, 1423–1432. [Google Scholar] [CrossRef]
Kelly, R. The AMSR-E snow depth algorithm: Description and initial results. J. Remote Sens. Soc. Jpn. 2009, 29, 307–317. [Google Scholar]
Dai, L.; Che, T.; Ding, Y. Inter-Calibrating SMMR, SSM/I and SSMI/S Data to Improve the Consistency of Snow-Depth Products in China. Remote Sens. 2015, 7, 7212–7230. [Google Scholar] [CrossRef] [Green Version]
Gelaro, R.; McCarty, W.; Su’arez, M.J.; Molod, A.; Takacs, L.; Randles, C.; Darmenov, A.; Bosilovich, M.G.; Reichle, R.; Wargan, K.; et al. The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). J. Clim. 2017, 30, 5419–5454. [Google Scholar] [CrossRef] [PubMed]
Danielson, J.J.; Gesch, D.B. Global Multi-Resolution Terrain Elevation Data 2010 (GMTED2010); US Geological Survey: Reston, VA, USA, 2011. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Feng, J.; Wang, W.; Li, J. An LM-BP neural network approach to estimate monthly-mean daily global solar radiation using MODIS atmospheric products. Energies 2018, 11, 3150. [Google Scholar] [CrossRef] [Green Version]
Feng, P.; Wang, B.; Liu, D.L.; Yu, Q. Machine learning-based integration of remotely-sensed drought factors can improve the estimation of agricultural drought in South-Eastern Australia. Agric. Syst. 2019, 2019, 303–316. [Google Scholar] [CrossRef]
Gregorio, L.D.; Callegari, M.; Marin, C.; Zebisch, M.; Bruzzone, L.; Demir, B.; Strasser, U.; Marke, T.; Günther, D.; Nadalet, R.; et al. A novel data fusion technique for snow cover retrieval. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2862–2877. [Google Scholar] [CrossRef] [Green Version]
Mateo-Pérez, V.; Corral-Bobadilla, M.; Ortega-Fernández, F.; Vergara-González, E.P. Port Bathymetry Mapping Using Support Vector Machine Technique and Sentinel-2 Satellite Imagery. Remote Sens. 2020, 12, 2069. [Google Scholar] [CrossRef]
Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
Belgiu, M.; Dra˘gut, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Qu, Y.; Zhu, Z.; Chai, L.; Liu, S.; Montzka, C.; Liu, J.; Yang, X.; Lu, Z.; Jin, R.; Li, X.; et al. Rebuilding a Microwave Soil Moisture Product Using Random Forest Adopting AMSR-E/AMSR2 Brightness Temperature and SMAP over the Qinghai–Tibet Plateau, China. Remote Sens. 2019, 11, 683. [Google Scholar] [CrossRef] [Green Version]
Zhong, X.; Zhang, T.; Kang, S.; Wang, K.; Zheng, L.; Hu, Y.; Wang, H. Spatiotemporal variability of snow depth across the Eurasian continent from 1966 to 2012. Cryosphere 2018, 12, 227–245. [Google Scholar] [CrossRef] [Green Version]
Cai, Y.; Guan, K.; Lobell, D.; Potgieter, A.B.; Wang, S.; Peng, J.; Xu, T.; Asseng, S.; Zhang, Y.; You, L.; et al. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agric. For. Meteorol. 2019, 274, 144–159. [Google Scholar] [CrossRef]
Essery, R.; Kontu, A.; Lemmetyinen, J.; Dumon, M.; Ménard, C.B. A 7-year dataset for driving and evaluating snow models at an Arctic site (Sodankylä, Finland). Geosci. Instrum. Method. Data Syst. 2016, 5, 219–227. [Google Scholar] [CrossRef] [Green Version]
Paul, A.B.; Murray, D.M.; Diana, L.V. Modified snow algorithms in the Canadian land surface scheme: Model runs and sensitivity analysis at three boreal forest stands. Atmos. Ocean. 2006, 44, 207–222. [Google Scholar]
Reba, M.L.; Marks, D.; Seyfried, M.; Winstral, A.; Kumar, M.; Flerchinger, G. A long-term data set for hydrologic modeling in a snow-dominated mountain catchment. Water Resour. Res. 2011, 47, 218–223. [Google Scholar] [CrossRef]
Landry, C.C.; Buck, K.A.; Raleigh, M.S.; Clark, M.P. Mountain system monitoring at Senator Beck Basin, San Juan Mountains, Colorado: A new integrative data source to develop and evaluate models of snow and hydrologic processes. Water Resour. Res. 2014, 50, 1773–1788. [Google Scholar] [CrossRef]
Wever, N.; Schmid, L.; Heilig, A.; Eisen, O.; Fierz, C.; Lehning, M. Verification of the multi-layer SNOWPACK model with different water transport schemes. Cryosphere 2015, 9, 2271–2293. [Google Scholar] [CrossRef] [Green Version]
Tedesco, M.; Jeyaratnam, J. A New Operational Snow Retrieval Algorithm Applied to Historical AMSR-E Brightness Temperatures. Remote Sens. 2016, 8, 1037. [Google Scholar] [CrossRef] [Green Version]
Ménard, C.B.; Essery, R.; Barr, A.; Bartlett, P.; Wever, N. Meteorological and evaluation datasets for snow modelling at ten reference sites: Description of in situ and bias-corrected reanalysis data. Earth Syst. Sci. Data 2019, 11, 865–880. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Land cover classification of the Northern Hemisphere (Forest, Grassland, Shrub, Bare-land and Unclassified) based on GlobCover 2009.

Figure 2. Flowchart of the proposed fusion approach for estimating snow depth over the Northern Hemisphere.

Figure 3. Average uncertainties of five commonly used snow depth datasets and the fused snow depth dataset compared with in situ observations from September to May from 2002 to 2011. Each sub-figure shows a different assessment index: (a) Coefficient of determination (R²); (b) Root Mean Squared Error (RMSE) and (c) Mean Absolute Error (MAE).

Figure 4. Density scatterplots of the fused snow depth and in situ observations between 2002 and 2011.

Figure 5. Histogram of the in situ observations and fused values (BIAS) distributed between the fused snow depth and in situ observations. BIAS values were divided into five classes; the numbers in the picture indicate the quantities of each class.

Figure 6. Spatial distribution of average BIAS between the fused snow depth and in situ observations.

Figure 7. Time series of fused snow depth (black lines) and in situ observations (red lines) at five sites.

Figure 8. The spatial patterns of the fused snow depth based on the Random Forest Regression (RFR) algorithm across the Northern Hemisphere between 2002 to 2011. (a) Average snow depth in all seasons, (b) Autumn (September to November) average snow depth, (c) Winter (December to February), and (d) Spring (March to May) average snow depth.

Figure 9. The average and maximum snow depth during the snow hydrology year from 2002 to 2011. Red line: monthly average snow depth; blue line: monthly maximum snow depth.

Table 1. Summary of the main snow depth datasets used in this study.

Dataset	AMSR-E	NHSD	GlobSnow	ERA-Interim	MERRA-2
Organization	NASA/JAXA	TPDC	ESA	ECMWF	NASA
Spatial coverage	0°–90°N	0°–90°N	35°–85°N	0°–90°N	0°–90°N
Spatial resolution	0.25° × 0.25°	0.25° × 0.25°	25 km × 25 km	0.25° × 0.25°	0.5° × 0.625°
Projection/Datum	WGS-84	WGS-84	EASE-GRID	WGS-84	WGS-84
Time resolution	Daily	Daily	Daily	6 h	Daily
Parameter transformation	SD	SD	SWE/ρ	SWE/ρ	SD * × fsc
Algorithm/Model	Improved Chang algorithm	Improved Chang algorithm	HUT, model assimilation	TESSEL	NSIPP

ρ represents snow density, ‘SD’ denotes the average snow depth in a 0.25° × 0.25° pixel, ‘SWE’ stands for the average snow water equivalent in one pixel, ‘SD *’ denotes the average snow depth in snow-covered area of a pixel and ‘fsc’ stands for fraction of snow cover in a pixel.

Table 2. The uncertainties of the three models in different land cover types and different seasons. (The coefficient of determination (R²), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) of three fused snow depth data against in situ observations in different snow stages and different land cover areas.). Note that the two numbers within brackets represent the samples used in training and prediction, respectively.

(a) Bare-Land		September to November			December to February			March to May
		(99,706, 99,060)			(106,723, 105,087)			(97,442, 95,219)
		R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm
ANN	train	0.61	1.4	4.1	0.73	2.4	12.5	0.72	5.3	18.7
ANN	predict	0.45	1.7	4.2	0.50	5.5	13.6	0.62	8.3	22.7
SVR	train	0.73	1.0	4.2	0.56	4.0	19.3	0.53	9.2	21.7
SVR	predict	0.43	1.0	4.5	0.34	6.8	25.0	0.42	10.8	23.5
RFR	train	0.93	0.7	2.8	0.95	1.9	10.0	0.95	1.1	2.7
RFR	predict	0.78	0.6	3.8	0.81	2.3	10.5	0.82	1.8	4.4
(b) Shrub		September to November			December to February			March to May
		(88,818, 81,100)			(98,666, 94,869)			(109,027, 106,029)
		R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm
ANN	train	0.66	0.3	1.3	0.64	4.1	9.9	0.66	5.0	13.2
ANN	predict	0.32	0.5	1.7	0.55	7.4	17.0	0.55	6.4	17.8
SVR	train	0.77	0.5	1.8	0.65	2.8	6.2	0.65	4.5	16.1
SVR	predict	0.42	0.6	2.6	0.32	7.5	8.5	0.45	6.1	20.9
RFR	train	0.91	0.2	0.5	0.90	2.3	4.8	0.95	2.5	1.4
RFR	predict	0.71	0.2	1.2	0.71	4.7	6.1	0.78	4.3	3.3
(c) Grassland		September to November			December to February			March to May
		(61,511, 60,531)			(51,390, 50,487)			(59,627, 59,285)
		R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm
ANN	train	0.81	1.5	2.0	0.78	5.1	9.9	0.78	6.2	12.7
ANN	predict	0.56	1.8	2.7	0.52	6.5	13.1	0.59	8.5	20.1
SVR	train	0.77	0.5	1.8	0.64	5.0	13.4	0.70	7.1	18.9
SVR	predict	0.42	0.8	2.6	0.48	10.1	21.3	0.56	9.6	23.7
RFR	train	0.88	0.2	1.5	0.92	2.5	1.2	0.96	1.5	3.7
RFR	predict	0.71	0.2	3.2	0.81	5.1	2.6	0.85	2.8	5.1
(d) Forest		September to November			December to February			March to May
		(159,146, 157,121)			(195,542, 197,884)			(196,501, 193,886)
		R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm	R²	MAE/cm	RMSE/cm
ANN	train	0.73	0.9	2.5	0.73	16.3	22.1	0.67	18.4	31.4
ANN	predict	0.43	1.0	3.3	0.67	17.5	25.6	0.57	21.7	36.4
SVR	train	0.75	0.7	3.0	0.64	14.7	26.3	0.60	20.6	40.1
SVR	predict	0.33	0.9	3.4	0.42	18.7	33.1	0.47	22.8	42.6
RFR	train	0.85	0.1	0.5	0.95	1.0	2.0	0.96	1.6	3.3
RFR	predict	0.66	0.5	2.1	0.80	2.0	2.8	0.80	2.4	4.7

Table 3. Different schemes of input variables for Random Forest Regression.

Scheme	Input Variables	Variable Excluded
1	Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, MERRA-2	Latitude
2	Latitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim, MERRA-2	Longitude
3	Latitude, Longitude, AMSR-E, NHSD, GlobSnow, ERA-Interim, MERRA-2	Elevation
4	Latitude, Longitude, Elevation, NHSD, GlobSnow, ERA-Interim, MERRA-2	AMSR-E
5	Latitude, Longitude, Elevation, AMSR-E, GlobSnow, ERA-Interim, MERRA-2	NHSD
6	Latitude, Longitude, Elevation, AMSR-E, NHSD, ERA-Interim, MERRA-2	GlobSnow
7	Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, MERRA-2	ERA-Interim
8	Latitude, Longitude, Elevation, AMSR-E, NHSD, GlobSnow, ERA-Interim	MERRA-2

Table 4. Accuracy comparison of different input variables.

Priori Conditions	Input Variables	R²	RMSE/cm	MAE/cm
Land cover type	All variables	0.81	10.5	2.3
	Elevation excluded	0.75	12.8	3.4
	Latitude excluded	0.78	11.2	3.0
Bare-land	Longitude excluded	0.78	11.1	2.9
Bare-land	AMSR-E excluded	0.79	11.0	2.7
Seasons	NHSD excluded	0.79	10.9	2.8
Seasons	GlobSnow excluded	0.76	12.6	3.3
December to February	ERA-Interim excluded	0.80	10.7	2.6
December to February	MERRA-2 excluded	0.77	12.4	3.1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Y.; Che, T.; Dai, L.; Xiao, L. Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere. Remote Sens. 2021, 13, 1250. https://doi.org/10.3390/rs13071250

AMA Style

Hu Y, Che T, Dai L, Xiao L. Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere. Remote Sensing. 2021; 13(7):1250. https://doi.org/10.3390/rs13071250

Chicago/Turabian Style

Hu, Yanxing, Tao Che, Liyun Dai, and Lin Xiao. 2021. "Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere" Remote Sensing 13, no. 7: 1250. https://doi.org/10.3390/rs13071250

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere

Abstract

1. Introduction

2. Data and Methods

2.1. Data

2.1.1. Remote Sensing Snow Depth Datasets

2.1.2. Reanalysis Snow Depth Datasets

2.1.3. Ground-Based Measurement

2.1.4. Auxiliary Data

2.2. Methodology and Experimental Design

2.2.1. Machine Learning Methods

2.2.2. Experimental Design

3. Results

3.1. Comparison among the Fused Snow Depth from Three Machine Learning Methods

3.2. Comparison between the Fused Dataset and Five Other Snow Depth Datasets Based on Observations

3.3. Accuracy Assessment of the Fused Snow Depth Dataset at Five Independent In Situ Snow Observation Sites

3.4. The Spatial Distribution of the Fused Snow Depth Dataset Based on Random Forest Regression

4. Discussion

4.1. The Effect of Seasons on the Fused Snow Depth Dataset

4.2. Improvement between the Current Study and Previous Work

4.3. Determining the Input Parameters of the RFR Model

4.4. Limitations of the Current Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI