Simulation of Daily Snow Depth Data in China Based on the NEX-GDDP

: In this study, a backpropagation artificial neural network snow simulation model (BPANNSIM) is built using data collected from the National Climate Reference Station to obtain simulation data of China’s future daily snow depth in terms of representative concentration pathways (RCP4.5 and RCP8.5). The input layer of the BPANNSIM comprises the current day’s maximum temperature, minimum temperature, snow depth, and precipitation data, and the target layer comprises snow depth data of the following day. The model is trained and validated based on data from the National Climate Reference Station over a baseline period of 1986–2005. Validation results show that the temporal correlations of the observed and the model iterative simulated values are 0.94 for monthly cumulative snow cover duration and 0.88 for monthly cumulative snow depth. Subsequently, future daily snow depth data (2016–2065) are retrieved from the NEX-GDPP dataset (Washington, DC/USA: the National Aeronautics and Space Administration(NASA)Earth Exchange/Global Daily Downscaled Projections data), revealing that the simulation data error is highly correlated with that of the input data; thus, a validation method for gridded meteorological data is proposed to verify the accuracy of gridded meteorological data within snowfall periods and the reasonability of hydrothermal coupling for gridded meteorological data.


Introduction
Snow cover is an important component of the cryosphere and indicator of climate change [1] as its properties change rapidly in response to changes in heat and water on the earth's surface [2][3][4]. Snowfall also has an important impact on socioeconomic factors of humanity; for example, insufficient snowfall in spring can lead to drought, and excessive snowfall can create disasters, such as snow-melt floods, and major property losses [5][6][7][8][9]. Therefore, the effective prediction and detection of various snowfall parameters are crucial for alleviating or minimizing these effects. At present, the primary methods of acquiring snow parameter data include (1) on-site observations of meteorological stations [10]; (2) optical remote sensing methods for identifying the extent of snow cover, based on the high reflectivity of snow in the visible band, and low reflectivity in the NIR band to define a normalized difference snow index (NDSI) [11]; (3) passive microwave remote sensing for global/regional snow depth and snow water equivalent observations [12][13][14]; and (4) fusion of optical and microwave remote sensing inversions [15]. Thus, snow depth data acquisition mainly depends on two approaches: remote sensing observations and onsite observations. Under changing global climate conditions, snow cover can serve as an important indicator. Accordingly, scientists have carried numerous studies examining the change in snow cover over historical periods [3,4,6,16], the relationship between snow cover change and climate, and the impact of snow cover change on human productivity. Some analyses have also studied future changes in snow cover based on the experimental data of the Coupled Model Intercomparison Project (CMIP) organized by the World Climate Research Programme (WCRP) [17,18]; however, there are few studies on the future daily snow depth, as snow accumulation and melting is a complex process affected by many factors, such as temperature, precipitation, wind speed, solar radiation, underlying surface type, and altitude [11,15,19].
Here, using NASA Earth Exchange/Global Daily Downscaled Projections data, future daily depth was simulated based on a selection of factors affecting the snow accumulation process in accordance with Leathers and Luff who found that the duration of snow is highly correlated with snowfall and temperature [20]. Here, snowfall (weight equivalent to solid precipitation in NEX-GDDP data) and temperature were selected as the input variables for simulating snow depth. A backpropagation neural network snow simulation model (BPNNSIM) was built using MATLAB, and data on the current daily snow depth, daily minimum air temperature, daily maximum air temperature, and daily precipitation were used to predict the next day's snow depth. The neural network was trained, and model accuracy was verified with Climate Reference Station data. Based on the BPNNSI, the NEX-GDDP data were used as model input to simulate future snow depth in China. The NEX-GDDP data comprise the first multimodal high-resolution dataset based on Coupled Model Intercomparison Project Phase 5 (CMIP5) released by NASA in 2015 (Table 1). A statistical downscaling method was used to convert the daily precipitation, maximum and minimum air temperature data from 21 CMIP5 models during the historical period from 1986 to 2005, and two future climate scenarios (RCP4.5 and RCP8.5) over the projection period from 2006 to 2100, at a spatial resolution of 0.25 • × 0.25 • [21]. Comparatively, NEX-GDDP data have a higher and more uniform spatial resolution than CMIP5, and many studies have shown that the former can better reflect the characteristics of regional climate change in China than the direct use of CMIP5 data [22][23][24]. CMIP6 data are in the release and preliminary application stage, and the resolution of each mode varies greatly [25]; thus, the consistently higher and more uniform resolution of NEX-GDDP will continue to maintain high application value. Accordingly, the future daily snow depth dataset simulated using the NEX-GDDP can inform future research on snow cover and snow disaster risk assessment. This paper is structured as follows: methods and data are presented in Section 2; Section 3 contains the results of the findings, including BPNNSIM construction, validation, and simulation of future daily snow depths in China; sources of errors in simulated snow depth data are discussed in Section 4; conclusions are presented in Section 5.
The long-term snow depth dataset of China was derived from passive microwave remote sensing data, which provides the daily snow depth distribution in China from 24 October 1978 to 31 December 2012, at a spatial resolution of 0.25 • × 0.25 • [13].

Construction of Snow Depth Simulation Model
Deep learning is a recent research direction in the field of machine learning, where the rapid development of computer technology has made artificial intelligence (AI) more attainable. Recently, artificial neural networks (ANNs) have been applied in various fields, and widely used in geographical research [26,27]. Here, a backpropagation (BP) network was selected and built using MATLAB ( Figure 2). A total of 155 sites were used to train the BP neural network, comprising 1,133,050 training and test samples.

1.
Selection of input and output layer variables. The input layer included data of daily maximum and minimum temperatures, daily precipitation, and daily snow depth from the National Climate Reference Stations to predict the next day snow depth data at the National Climate Reference Station ( Figure 3, Part 1).

2.
Determining the number of hidden layer nodes for network training (Equation (1)): where p is the number of hidden nodes, n is the number of input layer nodes (n = 4 here), m is the number of output layer nodes (m = 1 here), and a is a constant in the range from 1 to 10. Through trial and error, it was revealed that the training effect was the best when the p = 10.

3.
Setting the network training parameter is critical to model accuracy. To this end, trained sample data and test samples were compared numerous times, for determining the number of parameters in the BPNNSIM (Table 2).

Verification of Model Iteration Simulation Accuracy
Based on the simulation model building and training, to determine the precision of the model iteration simulation, we used daily precipitation, daily minimum temperature, and daily maximum temperature data at 30 randomly selected National Climate Reference Stations. A day without snow cover was used as the model iteration starting simulation time. The daily simulation snow depth data were obtained on a daily time scale ( Figure 4). The simulated snow depths were used for comparison with the observations from the corresponding stations ( Figure 3, Part 2) according to the following follow process. First, multisite monthly values of cumulative snow depth and snow cover days were calculated using the simulated and observed values from the 30 validation sites. Next, the Nash and correlation coefficients [28] between the simulated and observed values across the time series were calculated according to (Equations (2)-(5)): where E is the Nash coefficient, S m is the observation value, S t 0 is the observation value in month t, S t m is the simulated value in month t, and S 0 is the total average of the observations. If E is close to 1, model quality is high and is credible, whereas if E is close to 0, it indicates that the simulation is close to the mean level of the observed values, although process errors are large. If E is much less than zero, the model is not credible.
The correlation value (r) is calculated as follows: where x refers to the simulated value, x refers to the average of the simulations, y refers to the observation values, y refers to the average of the observations, and n refers to the number of mouths.

A Comparative Method for Remote Sensing and NEX-GDDP Snow Depth
First, annual snow depth observations from the climate stations were used to calculate a standard value of cumulative snow depth data for all stations. Further, annual averages of the cumulative snow depth data at the climate station locations were calculated using depth information derived from NEX-GDDP or microwave remote sensing data, and these were regarded as the simulated values for the corresponding model in NEX-GDDP or remote sensing. Lastly, the differences between the standard and simulated values were analyzed according to their RMSE and correlation coefficient.
where x denotes the simulated model, y is the observation values, and n represents the number of years, where a smaller RMSE indicates better simulation capability.

Model Building
Model accuracy peaked at 904 training cycles, with a minimum root square error (MSE) of validation of 0.22 ( Figure 5). The correlation coefficients (r) between the various simulated and actual values for the trained BP neural network were >0.95.

Validation Results of Model Iterative Simulation Capabilities
Figures 6 and 7 show the correlation between the iterative simulations and observed values of cumulative monthly snow depth and snow cover days during the principal snow cover months (October to April). Simulated results were underestimated at larger cumulative snow depths and overestimated for cumulative snow cover days. The multiyear, monthly average of observed cumulative snow depth across all stations was 13.77 cm, compared to a modeled value of 12.40 cm. The monthly average observed cumulative snow cover days was 4.06 day, compared to a modeled value of 4.98 day.    To clarify the regional differences in model simulation capabilities, the observed and iterative simulated average values of cumulative snow depth and cover duration for all stations within each provincial unit were tallied. The results yielded variability between the simulated and observed values of cumulative snow depth, and consistently overestimated values of cumulative snow cover duration across all provinces. At the provincial scale, the correlation coefficient between the observed and simulated monthly average values of cumulative snow depth was 0.93 (R 2 = 0.85), and that for cumulative snow cover duration was 0.97 (R 2 = 0.93) (Figures 9 and 10).

Daily Snow Depth Simulation Based on NEX-GDDP
Based on the spatiotemporal accuracy evaluation of the iterative simulation, the NEX-GDDP data were selected as the input values for the BPNNSIM to iteratively simulate the daily snow depth data from the corresponding model in NEX-GDDP. The simulated data corresponds to that of NEX-GDDP, which contains 21 models under two climate scenarios (RCP4.5 and RCP8.5), at a spatial resolution of 0.25 • × 0.25 • for the periods of 1986-2005 and 2016-2065. The average annual snow depth distribution in China based on NEX-GDDP data is shown in Figure 11, and reveals that the future coverage under the RCP4.5 scenario is similar to that of the historical period. Moreover, under the RCP8.5 scenario, the coverage shows notable southward expansion, whereas Northeast and Northwest China, as well as the Qinghai-Tibet Plateau, comprise the primary snowfall regions in the country.

Comparison of Remote Sensing and NEX-GDDP Snow Depth
To clarify the differences between NEX-GDDP and remotely derived snow depth data, the dataset was compared with the long-term snow depth dataset for the period from 1986 to 2005 [13]. Figure 12 shows the time series variability of annual cumulative snow depth data from the simulations and station observations. The multi-year average of cumulative snow depth from the observation stations was 178.06 cm·y −1 , 266.25 cm·y −1 from remote sensing data, and 75.55 cm·y −1 from NEX-GDDP data. In general, the snow depth values from remote sensing and NEX-GDDP data were higher and lower than the observations, respectively.
Simulation accuracy of snow depth data varied by models in the NEX-GDDP dataset. In terms of RMSE, the two models with the smallest simulation errors were GFDL-ESM2G and MPI-ESM-MR (29.88 and 29.97 cm, respectively). In terms of r, the two best models were GFDL-ESM2G and bcc-csm1-1 (0.58 and 0.50, respectively). Thus, the most accurate model was GFDL-ESM2G. Compared with remotely derived snow depth estimates, the RMSE and r were 28.78 and 0.52, respectively (Table 3), with an insignificant difference in accuracy between the two snow depth simulations, thus maintaining a similar ability to depict cumulative snow depth and cumulative snow cover days over China.  Correlation is significant at the * p < 0.05 and ** p < 0.01 level.

Simulated Snow Depth Error Sources in GFDL-ESM2G Model
For conveniently simulated snow depth data using in future, the GFDL-ESM2G model provided the most accurate performance; however, the sources of error present deserve discussion. First, at the provincial scale, the difference in snowfall (DS, cm) and the difference in accumulated snow time (DST, days) between the snow depth data from the GFDL-ESM2G simulation and meteorological stations were calculated based on the station points. Figures 13 and 14 show that the average annual snowfall levels from the GFDL-ESM2G simulation in Tibet, Yunnan, Shandong, Sichuan, Qinghai, and Liaoning provinces were higher than the observed values, whereas in Inner Mongolia, Henan, Jilin, Xinjiang, and Heilongjiang provinces, estimates were lower than the observations. Except for Tibet (where DS = 437 cm·year −1 ), the simulated values of snow duration according to the GFDL-ESM2G model were less than the station measured data for primary snowfall provinces.  Here, the model built for snow depth simulation was based on snow precipitation after the temperature has fallen to a certain value. As the model input variables were daily precipitation, as well as daily maximum and minimum temperatures, the amount of precipitation directly affected the amount of snowfall, while the snow cover duration was determined by a combination of air temperature and snowfall. Accordingly, to help resolve the differences in the DS and DST across all regions, we need to be clear about the difference in precipitation (DP) and difference in temperature (DT) between the snow depth data from the GFDL-ESM2G simulation and meteorological stations during snowfall periods.

Relationship between DS and DP
DP was first analyzed across different provinces during snowfall periods, and based on the initial results, further comparisons were made between the correlation of DP and DS. The results showed that the modeled precipitation of GFDL-ESM2G was higher than that of the meteorological station observations over Sichuan, Qinghai, and Tibet during the snowfall period; however, the remainder of the major snowfall provinces displayed the opposite phenomenon ( Figure 15). Further, the correlation coefficient between DP and DS was 0.894 (R 2 = 0.80, Figure 16).

Relationship between DST, DT, and DP
DT for the daily maximum and minimum temperatures were also calculated under different snow depth conditions, with the results showing that the temperature values from GFDL-ESM2G were higher than those from the meteorological stations in the major snow provinces during snowfall periods. In the provinces of Liaoning, Xinjiang, Inner Mongolia, Jilin, and Heilongjiang, DS and DT of both the daily maximum and minimum temperatures were positively correlated; however, other major snowfall provinces showed the opposite trend (Table 4, Figure 17). Accordingly, the relationship between average DT, DS, and DST was further studied by multi-factor analysis, revealing the following relationship (Equation (7)): where L is the DST, T is the DT, S is the DS, and the formula maintains an R 2 of 0.912. From Equation (7), it was determined that DT is the most influential factor on DST, corresponding to relatively small differences in the snowfall amount. Therefore, excluding Tibet, a region with a large DS, the relationship between DST and DT was analyzed for other provinces as well, revealing that the higher the DT, the lower the DST value ( Figure 18). More specifically, DT and DST showed a negative correlation (R 2 = 0.811), a result consistent with the findings of Leathers and Luff [20], who concluded that the duration of snow is highly correlated with snowfall and temperature.   Collectively, the error sources of daily snow depth data from GFDL-ESM2G varied by province, primarily in the form of precipitation data errors during snowfall periods over Tibet, and the temperature data errors in the northeastern provinces and Xinjiang. Overall, the coupling of precipitation and temperature data from the GFDL-ESM2G model was relatively poor during snowfall periods.

Accuracy of the Gridded Meteorological Data
This study implemented a novel approach for verifying the raster data accuracy of temperature and precipitation in alpine regions. Currently, such validations are primarily based on either meteorological station, or interpolated, gridded meteorological data derived from these stations; however, station-based validation methods often do not reflect the integrated accuracy of gridded data, especially in poorly represented areas, or those with high terrain variability [29,30]. Accordingly, the new method for raster meteorological data validation proposed here was based on the results of the current study and designed according to the following process. The first step was to obtain highly accurate daily snow depth values via remote sensing, photogrammetry, or station observation data. The gridded meteorological data to be validated were converted into daily snow depth data using the model developed in the present study. The DS between the two snow depth datasets, or the DST between the two snow depth datasets were calculated. Finally, the differences between the validated grid meteorological data and the true weather data, as inferred from the relationship between DS, DST, DP, and DT, were obtained.
The following points should be noted when implementing this verification method: (1) The gridded meteorological data can only be verified during the snowfall period; thus, this method is most suitable for alpine weather data verification. (2) The validation error of this method is causally related to the grid cell size of the meteorological data; accordingly, obtaining high quality daily snow data are an important prerequisite for ensuring validation accuracy.

Simulated Snow Accumulation and Melting
The snow depth simulation model here also provides a route for the accurate simulation of snow accumulation and melting processes, provided that the complete set of influencing factors can be considered, and the spatiotemporal scales of the study can be controlled. ANNs model relationships using various influential factors on snow melt and snow melt data. Based on this training, the model can accurately simulate the snow melting process, although this method requires a large number of observations.

Conclusions
Based on previous findings of the most influential factors controlling snow accumulation, temperature and precipitation were selected here as the input variables for the backpropagation artificial neural network snow simulation model (BPANNSIM) created here using MATLAB. The model was trained and validated using the National Climate Reference Station data, and the results showed that the iterative simulation capability of the model was stronger for both spatiotemporal sequences, with temporal and regional correlations (R 2 ) of monthly snow cover duration equal to 0.94 and 0.97, and 0.88 and 0.91 for monthly cumulative snow depth, respectively. The corresponding Nash coefficients between the observed and simulated values for the cumulative snow depth and duration were 0.91 and 0.87, respectively. Thus, the model's temporal and snow depth, and iterative simulation capabilities were slightly weaker than its regional and snow cover duration abilities. The NEX-GDDP dataset was used as the input value for BPANNSIM to simulate the daily snow depth across China, and the corresponding snow depth data obtained from GFDL-ESM2G showed the highest level of accuracy. Finally, the causes of simulation errors in the GFDL-ESM2G model were also analyzed, revealing that the coupling of precipitation and temperature data from the GFDL-ESM2G model was relatively poor during snowfall periods. It was also found that DS and DST were highly correlated with DP and DT, and a new validation method for gridded meteorological data was proposed here based on this correlation. This method can verify the accuracy of gridded meteorological data within snowfall periods and verify whether the hydrothermal coupling of this data is reasonable. However, this method is applicable to the validation of meteorological data during the snowfall period. Meanwhile, the validation error of this method is causally related to the grid cell size of the meteorological data. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The Snow depth simulation dataset is publicly available. The data are currently being uploaded to the National Qinghai-Tibet Plateau Scientific Data Center (12 December 2021: http://data.tpdc.ac.cn/). The other data presented in this study are available on request from the corresponding author.