1. Introduction
Climate temperature warming, which has been observed across the planet in recent decades, is also typical for the territory of Russia. The most important climate variables that are often used as climate change indicators are surface air temperature and precipitation [
1]. The growth rate of the average annual temperature in Russia in 1976–2019 was 0.47 °C/decade, which is more than two and a half times higher than the rate of global temperature increase over the same time interval (0.18 °C/decade) [
2]. As for atmospheric precipitation for the same time interval, we see (according to Reference [
2]) that there is a tendency toward an increase in the annual precipitation in several regions of Siberia and the Russian Far East; however, precipitation decreases in the northeast of the country.
The most significant linear trend coefficients are observed in the regions of Siberia in the spring. Moreover, in Northern Eurasia, there is a moderate increase in the total amount of precipitation accompanied by a relatively strong increase in heavy rainfall and a simultaneous decrease in stratiform precipitation for the period 1966–2016 [
3]. According to the CMIP5 projections, in the 21st century, annual and seasonal precipitation will increase everywhere, especially in the arctic region of Russia [
4]. The frequency of heavy precipitation is influenced by changes in the characteristics of air humidity and atmospheric circulation that can lead to the development of extreme climatic events. First of all, this increases the frequency of severe floods and droughts [
5]. The trends in extreme precipitation are also correlated with the geographic features of the region and the used data source quality [
6,
7].
In general, there is an increase in the frequency and intensity of extreme precipitation across the territory of Russia [
7,
8,
9,
10,
11]; the number of days without precipitation increases in the winter and decreases in the summer [
2]. Moreover, most of the territory is characterized by an increase in the number of days with heavy snowfall [
9]. Thus, in Western Siberia, there is an increase in extreme precipitation in the winter and a tendency toward dry periods in the summer [
12]. In the south of the region, the risks of heavy and long-term precipitation increase [
6,
13].
The most accurate source of information on atmospheric precipitation is data of meteorological observations, but they are irregular in space and time [
14]. Satellite data provide regular observations in space, with almost global coverage. However, they have significant errors associated, among other things, with the inaccuracy of the algorithms for calculating precipitation based on the intensity of direct or scattered radiation [
15]. The reanalysis data are given at regular grids, but they may differ from the data of meteorological stations due to various simplifications, parameterizations, and numerical schemes that introduce errors into the calculations [
14].
A comparative analysis of the precipitation extremality indices based on the station observational data, reanalyses, and satellite measurements for the territory of Eurasia was carried out in Reference [
16]. It showed that the reanalysis data, in comparison with the observational ones, significantly underestimated the extreme precipitation values by 30–35% on average, and the satellite data overestimated the extreme values by 30–50% in the winter and underestimated them by 40–60% in the summer. The comparison of ERA5 reanalysis data and station observational data revealed a high linear correlation for the southern part of Siberia for the period from 1979 to 2015. In this case, the maximum differences are typical for mountainous regions [
17].
We would like to mention that average precipitation values are calculated with much greater accuracy than extreme precipitation, since the latter are observed less frequently and have a large uncertainty in magnitude, especially in regions with a scattered observational grid (for example, in the north of Siberia) [
8,
18]. Therefore, the analysis of precipitation characteristics derived from different data sources can lead to contradictory conclusions. In this regard, an urgent question arises about the accuracy in assessing precipitation data, the importance of which is obvious not only for weather and climate change monitoring but also for solving forecasting problems.
The goal of this work is to investigate the variability and to compare the characteristics of atmospheric precipitation in Western Siberia over recent decades across different datasets.
The paper is organized as follows. In
Section 2, we present the study region and datasets and briefly describe the used methods. In
Section 3, the results of a time series analysis of precipitation characteristics (including their extremes) are presented. In
Section 4 we discuss the obtained results and compare them with studies from other works. The conclusion, limitations, and future direction of this research are briefly summarized in
Section 5.
3. Results
The interannual variability of the smoothed average seasonal median estimates of precipitation amount in different datasets for the period from 1979 to 2018 is presented in
Figure 2. The figure shows that the GPCC data has the maximum approximate annual values of the observational data. The APHRO data archive underestimates the values, and the NCEP reanalysis data overestimates ones that are especially pronounced in the warm season. At the same time, in the cold season, and only in the first half of the time interval (from 1979 to 1995), the highest values are observed for the ERA5 reanalysis dataset.
The NCEP reanalysis, in general, is characterized by the greatest variability in precipitation time series compared to other datasets. The discrepancies between the time series at the beginning of the 21st century are probably associated with using different calculation methods in data assimilation.
A spectral analysis of the time series showed that the periodic structure in the precipitation time series constructed from the observational data was well-explicit (significance level
p < 0.01) in ERA5 and in GPCC data, especially in the short-period part of the amplitude spectrum (fluctuation scale < 10 years), particularly in the cold season (
Figure 3).
Statistically significant fluctuations (
p < 0.01) in the seasonal series obtained from observational data, as well as from GPCC and APHRO datasets, were discovered at periods of 7 to 8 years in the warm season, while shorter-period fluctuations also appeared in the extreme values series (
Figure 3c,e). In the cold season, the indicated periodicities were not distinguished in the observational data (
Figure 3b,d,f). However, as for the long-period part of the spectrum, a period of 12 to 13 years was determined for extremely high precipitation characteristics that also exist in the GPCC dataset (
Figure 3f). No statistically significant fluctuations were found in the data series from meteorological stations; however, they were found in the NCEP and APHRO datasets. This could result from the influence of the trend components on the variability during time series processing.
Analyzing the calculated values by the threshold quantiles (see
Table 1 and
Figure 4), we found that the average annual median estimates based on the observational data were closest to the corresponding estimates derived from the APHRO dataset (274.0 mm and 291.2 mm, respectively). The NCEP reanalysis data had the highest median values (490.9 mm).
In the warm season, the closest values of extremely low precipitation to the estimates in the observational data belonged to the ERA5 dataset and the NCEP dataset of 1% and 5%, respectively. The extremely high values (95% and 99%) of precipitation at the stations were in good agreement with the APHRO and the GPCC datasets. For the cold season, in the main, a good agreement was observed with the ERA5 reanalysis data and with the NCEP reanalysis data for extremely high precipitation (99%). The greatest discrepancy in the cold season with the observational data at the stations was observed with the APHRO data archive. Thus, the annual average median extreme high values from the GPCC data showed the best agreement with these values from observational data at the stations.
The range of variability can be estimated using the coefficients of kurtosis (Ks) and skewness (As). In the first approximation, their analysis allowed us to describe the form of the PDF, taking into account its deviation from the normal distribution. Analyzing the density function, we concluded that, for the cold season (as well as for an entire year in general), all datasets (except the APHRO dataset) were characterized by positive skewness (As > 0 is a positive skew). That means that the distribution was right-skewed, right-tailed, or skewed to the right, which may indicate a decrease in precipitation values for the considered period and an increase in the frequency of extreme events with increased precipitation. This confirmed the results that were obtained earlier in Reference [
12]. In the warm season, only the GPCC dataset represented a positive skewness. The analysis of the derived kurtosis coefficients showed that the warm season was characterized by a flat-topped distribution (Ks < 0) that was the amount of precipitation over the studied time interval varying in a wide range of values. In the cold season, data from the stations and ERA5 dataset had a narrow peak distribution (Ks > 0). This indicated that the values varied within a narrow range of values. For an entire year, the kurtosis coefficient was positive and determined mainly by the cold season.
Figure 5 represents the spatial distribution of precipitation based on the observational data. We reveal that the amount of precipitation in the northern regions of Western Siberia is more than in the southern ones.
This may happen due to geographical reasons (latitude and relief) or may be related to the dominant form of atmospheric circulation and the influence of the ocean. Both in the warm and cold seasons, the maximum amount of precipitation is was at the station located in the mountain area, the minimum found in the Chuy River Basin.
Figure 5 also shows the spatial distribution of the precipitation characteristics for all the datasets, with the incorporation of the values from the considered weather stations. The maximum values of extreme precipitation were also indicated in the Altay Mountains region in all the considered datasets. We found an increase in the values of both extremely low and extremely high values of precipitation that were observed in the northeast, northwest (the arctic zone), and in the central part (the Siberian Ridges) of the territory. This is consistent with the observational data that was also mentioned in Reference [
17].
In the same areas, an increase in the linear trend coefficient for the average annual precipitation was observed based on the ERA5 and GPCC data. At the same time, the highest values of the linear trend were typical for the NCEP dataset. Using the NCEP and APHRO data, we outlined the tendencies for precipitation increase in the eastern part of Western Siberia with a maximum in the southeast (the Altay Mountains) and a decrease in the western and northwestern parts (the Ural Mountains). The conclusion about the closeness of the values of extreme precipitation from observational data to the GPCC data was also confirmed by the consistency of their spatial distribution over the territory of Western Siberia.
A comparative analysis of the amount of precipitation from different datasets was also performed using the Taylor diagrams constructed for the warm and cold seasons for the time interval from 1979 to 2018 (from 1979 to 2007 for the APHRO dataset) (
Figure 6). The data were interpolated from the reanalysis grid nodes to the station coordinates by the bilinear interpolation method. The analysis of the derived correlation coefficients showed a high correlation between the data for extreme precipitation (5% and 95%) at the meteorological stations with the GPCC and ERA data (the values of the correlation coefficients varied from 0.73 to 0.91).
These datasets were characterized by variability in the amount of precipitation within the same range. The smallest values of the correlation coefficients (r) were observed in NCEP data (0.55 < r < 0.62). At the same time, the precipitation values had less variability than the observational data in the APHRO dataset. However, in both cases, there was a relatively largely centered root mean square error for the precipitation values. We noted that similar trends were typical for the extreme precipitation values of 1% and 99%.
Additionally, the median values in all the datasets were in good agreement with the observational data; particularly, the correlation coefficient varied from 0.76 in the cold season to 0.95 in the warm season compared to 0.65 in the cold season to 0.78 in the warm season for the NCEP dataset.
The results of the comparative analysis of the estimates derived using the bilinear and cubic interpolation methods were generally similar, except for the cold season; the lowest correlation coefficient for the extremely low and high precipitation values was observed for the APHRO dataset (r < 0.3) (
Figure 7).
4. Discussion
We find that the values from the GPCC dataset are closest to those from the observational data. This is probably since the GPCC data represents gridded observational data from stations. However, at the same time, the APHRO archive (which also uses observational data) underestimates values, and the NCEP reanalysis data overestimates ones. The GPCC data also show the best agreement for annually averaged extremely high values. The agreement for other datasets can vary and can depend on the season. For example, the NCEP dataset can reproduce median and extreme values. According to the research for the European continent [
32], the NCEP2 dataset also demonstrates the closest to the station data estimates of extreme precipitation.
Applying the spectral analysis to the precipitation time series could allow us to better understand the characteristics of precipitation variability. Moreover, this is a quite new approach to investigating extreme precipitation variability in Western Siberia. For example, the GPCC dataset reveals the periodicities in the time series of observational data from stations in the warm and cold seasons. The ERA5 dataset reproduces the general variability but with a smaller amplitude. Statistically significant fluctuations are mainly distinguished in the warm season at periods of 7 to 8 years, while shorter-period fluctuations also appear in the extreme values series. It should be noted that revealed periodicities can be caused by dynamic processes in the atmosphere described by global atmospheric mechanisms, such as North Atlantic Oscillation (NAO), Atlantic Multidecadal Oscillation (AMO), and South Oscillation (El Niño), where this periodicity (7–9 years) is also observed in their time series [
33]. We suppose that this fact will be useful for the construction of climatic projections. The long-term periodicities systematically (for each characteristic) exist in the NCEP reanalysis data. This is caused by the presence of a trend (
Figure 2) in the time series, but a similar trend in the time series derived from meteorological stations could not be observed. However, such periodicities (15–17-year cycles) can come about for other regions of the planet [
34]. Additionally, in Reference [
35], it was revealed (periodogram-based time series methodology) that the monthly average precipitation has two different periodic structures of six months and twelve months that coincide with the seasonal pattern of the time series. However, the interannual periodicity is not explicit enough. Moreover, there is no information presented about the extreme values variability. The spectral analysis in the framework of this study revealed the periodic structure in the precipitation time series constructed from different datasets, where statistically significant values were mainly observed in the short-period part of the amplitude spectrum (fluctuation scale < 10 years). This result could be useful in the short-term forecasting of both the mean and extreme values of precipitation. For future research, it seems appropriate to apply the methods of multiscale and multivariate statistical analyses (including Wavelet analysis). This will allow us to show the coherency between the components of two time series in the time–frequency domain and to provide better comparison and visualization of the observed periods.
Based on the observations, we see that the precipitation in the northern stations (situated above 60° N) is greater than in the southern ones (situated below the 60° N). The maximum number of precipitations was observed at a station located in the Altay Mountain area, the minimum in the Chuy River Basin and in the Ural Mountains. The spatiotemporal variability of extreme precipitation revealed an increase in precipitation in the northeast and northwest (the arctic zone) and in the central part of the territory (the Siberian Ridges) that was consistent with the observational data. The correlation analysis showed that the GPCC and the ERA5 datasets were in good agreement with the observations (the correlation coefficient was up to 0.91). We obtained quite good agreement between observational data at the stations and GPCC data. The GPCC dataset outcomes can be explained by the fact that GPCC owns the largest and most comprehensive worldwide collection of precipitation data. This is based on daily surface synoptic observations and monthly climate messages [
36]. Moreover, it supports regional climate monitoring and climate variability analyses. The ERA5 reanalysis data (replaces the ERA-Interim reanalysis) enhanced the spatial and temporal resolutions in comparison with the other reanalyses, which allowed us to get information that was more detailed. The APHRODITE project develops daily precipitation datasets with high-resolution grids for Asia; however, it has limited time series. The NCEP reanalysis data has a quite coarse grid resolution for the analysis of regional precipitation characteristic variabilities, especially as concerns their extremes.
Thus, we compared different types of precipitation datasets for Western Siberia with different spatial and temporal resolutions, observational data on stations, gridded data, and reanalysis data. The choice of appropriate data source for the research of precipitation characteristic variabilities will firstly depend on the goal of the investigation. Moreover, the regional differences in the long-term tendencies of the precipitation characteristics (means and extremes) will depend on the changes in the used data assimilation and parametrization models in different datasets. The median estimates of the precipitation amount derived from station data and reanalysis data are in better agreement with each other rather than their extreme values. At the same time, in some cases, the temporal variability of the extremes can be quite effectively diagnosed by reanalyses, at least in comparison to the median values of precipitation [
32].
We found that some of our results related to the agreement between observational data and reanalysis ones have also been found in similar research [
2,
8]. However, in Reference [
37], it has been found that the APHRO archive is closest to the real observations in comparison with ERA-Interim for the Siberian region. This result is explained by the fact that the validation was made based on a single parameter (RMSE) and did not to take into account other statistical characteristics. The novelty of this study is that we proposed a comparative analysis not only for the mean values of precipitation but also for their extremes. The usage of different statistical methods (descriptive statistics, Fourier spectrum, and Taylor diagrams) makes the results presented in this study more reliable. This is quite important for the arctic part of the region, where an observational grid is significantly sparse.
5. Conclusions
In the framework of this study, we presented a comparative analysis of the atmospheric precipitation characteristics (mean and extremes) in Western Siberia from 1979 to 2018 across different datasets.
The performed analysis was based on data acquired from meteorological stations, global precipitation datasets such as APHRODITE and GPCC, and reanalysis archives, including NCEP-DOE and ERA5. The comparison was based on the methods of descriptive statistics, Fourier spectrum, and Taylor diagrams.
The best agreement of the values from the observational data was observed with the values from GPCC. This archive also represented the periodicities in the time series of observational data from the meteorological stations, especially in the short-period part of the spectrum. Underestimated values were revealed for the APHRODITE archive, while overestimated ones were found for the NCEP reanalysis data. In comparison with GPCC, the ERA5 dataset reproduced the general variability but with a smaller amplitude (the correlation coefficient was up to 0.9). In general, the median estimates of the precipitation amount derived from the meteorological stations’ data, as well from the reanalysis data, were in better agreement with each other rather than their extreme values. However, their temporal variability can be effectively described by other datasets.
The results obtained from the validation can be useful in solving various problems in climatology associated with the usage of data on the variable precipitation characteristics and extreme events (when studying the conditions for the formation of droughts, forest fires, degradation of the permafrost zone, etc.), as well as for the development and correction of regional climate models for more accurate climate change projections. In the framework of this study, we focused on a descriptive comparative analysis of the precipitation characteristics, where a spectrum analysis is one of the parts of the research. The novelty of our work is that we made a comparison of the time series amplitude spectrum averaged by the territory values of precipitation, as well as their extreme values.
Thus, we also suppose that the goal of our future work will deal with the application of multiscale and multivariate statistical analyses (including a wavelet analysis) that will allow us to conduct an analysis of the precipitation time series spectrum in more detail and to provide better comparisons and visualization of the results.