Evaluation of 2-m Air Temperature and Surface Temperature from ERA5 and ERA-I Using Buoy Observations in the Arctic during 2010–2020

In data-sparse regions such as the Arctic, atmospheric reanalysis is one of the key tools for understanding rapid climate change at the regional and global scales. The utility of reanalysis datasets based on data assimilation is affected by their accuracy and biases. Therefore, it is important to evaluate their performance. Here, we conduct inter-comparisons of two temperature variables, namely, the 2-m air temperature (Ta) and the surface temperature (Ts), from the widely used ERA-I and ERA5 reanalysis datasets provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) against in situ observations from three international buoy programs (i.e., the International Arctic Buoy Programme (IABP), the Multidisciplinary Drifting Observatory for the Study of Arctic Climate (MOSAiC), and the Cold Regions Research and Engineering Laboratory (CRREL)) during 2010–2020 in the Arctic. Overall, the results show that both the ERA-I and ERA5 were well correlated with the buoy observations, with the highest correlation coefficient reaching 0.98. There were generally warm Ta biases for both ERA-I (2.27 ± 3.33 °C) and ERA5 (2.34 ± 3.22 °C) when compared with more than 3000 matching pairs of daily buoy observations. The warm Ta biases of both reanalysis datasets exhibited seasonal variations, reaching the maximum of 3.73 ± 2.84 °C in April and the minimum of 1.36 ± 2.51 °C in September. For Ts, both ERA-I and ERA5 exhibited good consistencies with the buoy observations, but have higher amplitude biases compared with those for Ta, with generally negative biases of −4.79 ± 4.86 °C for ERA-I and −4.11 ± 3.92 °C for ERA5. For both reanalysis datasets, the largest bias of Ts (−11.18 ± 3.08 °C) occurred in December, while the biases were rather small (less than −3 °C) in the warmer months (April to October). The cold Ts biases for ERA-I and ERA5 were probably overestimated due to the location of the surface temperature sensors on the buoys, which may have been affected by snow cover. Both the Ta and Ts biases varied for different buoy programs and different sea ice concentration conditions, yet they exhibited similar trends.


Introduction
The climate in the Arctic has undergone profound changes in recent decades, mainly caused by the Arctic amplification effect; that is, the Arctic is warming more than twice as fast as the global average temperature [1][2][3]. In addition, there has been an increasing frequency and intensity of winter warming events and longer durations of a single event in the Arctic, as well as a positive trend in the maximum temperature in winter [4,5] During Table 1. Assessments of the temperature variables in the ERA series reanalysis datasets over the cold regions in the Northern Hemisphere in the previous studies. The research methods for these studies were similar and can be used to compare the reanalysis datasets with in-situ measurements.

Reanalysis Data Parameters Conclusions
Tjernström & Graversen (2009) [40] 1997-1998 Beaufort Sea Soundings ERA-40 Temperature at atmospheric boundary layer The ERA-I reanalysis dataset provides critical information for calibrating proxy records of several atmospheric and surface parameters around the world [30]. It was generated to serve as a bridge between the successful ERA-40 atmospheric reanalysis and future generations of reanalysis provided by the ECWMF [30]. This dataset has been available, but with a two-month delay, from 1979 to 2019, with a spatial resolution of 0.125 • and a temporal frequency of 6 h. The ERA-Interim ceased in August 2019 and was succeeded by the ERA5 reanalysis. The 2-m air temperature and skin temperature variables were used in the comparisons in this study. The dataset was obtained (for free but requiring registration) from the following address: https://www.ecmwf.int/en/forecasts/datasets/reanalysisdatasets/era-interim (accessed on 2 December 2020).

ERA5
ERA5 is the latest climate reanalysis dataset released by the ECMWF and is a successor to the ERA-I dataset [31]. The ERA5 dataset provides atmospheric and surface parameters from 1950 to the present. The most substantial improvements compared to the ERA-Interim Remote Sens. 2021, 13, 2813 4 of 19 include better availability and a higher temporal resolution (hourly) [34]. Compared to ERA-I, ERA5 also provides more parameters. ERA5 benefits from several upgrades, including the observation operators, and more importantly, a new data assimilation strategy (Integrated Forecast System (IFS) Cy41r2). The 2-m air temperature and skin temperature variables were used in the comparisons in this study. The dataset was obtained (for free but requiring registration) from the following website: https://www.ecmwf.int/en/forecasts/ datasets/reanalysis-datasets/era5 (accessed on 2 December 2020).

International Arctic Buoy Program
The International Arctic Buoy Program (IABP) is a coordinating program for autonomous buoy activities targeting numerous objectives in the Arctic [44]. The operational area of the IABP is the entire Arctic Ocean, including its marginal seas. The Argos system was implemented to transmit the in-situ observation data. There are hundreds of autonomous buoys and drifting stations under normal operating conditions, and these instruments were deployed every year at the North Pole, in the Beaufort Gyre, and at other locations of opportunity. Most of the instruments in the IABP can provide surface temperature time series data, but it is hard to get access to the vertical temperature profiles due to a lack of temperature chain sensors (https://iabp.apl.uw.edu/index.html (accessed on 15 November 2020)). It should be noted that the surface temperature is measured at the bottom of the buoy's hull. If the buoy is floating, then the reported temperature is the sea surface. If the buoy is frozen in ice, or sitting on top of ice, then the reported temperature is the surface temperature of the ice. The freezing temperature of sea water is about −1.8 • C, so temperature readings below this indicate ice temperatures. In addition to surface temperature, there are other basic parameters in each buoy file, for example, the buoy ID, year, hour, min, DOY, POS_DOY (the day of the year to the decimal minute of the reported position), latitude, and longitude. As shown in Figure 1a, the IABP buoy dataset used in this study was the daily updated level-1 in the Arctic during 2011/1-2020/10 and provided records at hourly intervals. We selected 730 buoys deployed in different regions of the Arctic (North of 70 • N), including 589 buoys for surface temperature and 141 buoys for air temperature. assimilation strategy (Integrated Forecast System (IFS) Cy41r2). The 2-m air temperature and skin temperature variables were used in the comparisons in this study. The dataset was obtained (for free but requiring registration) from the following website: https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5 (accessed on 2 nd , December, 2020).

International Arctic Buoy Program
The International Arctic Buoy Program (IABP) is a coordinating program for autonomous buoy activities targeting numerous objectives in the Arctic [44]. The operational area of the IABP is the entire Arctic Ocean, including its marginal seas. The Argos system was implemented to transmit the in-situ observation data. There are hundreds of autonomous buoys and drifting stations under normal operating conditions, and these instruments were deployed every year at the North Pole, in the Beaufort Gyre, and at other locations of opportunity. Most of the instruments in the IABP can provide surface temperature time series data, but it is hard to get access to the vertical temperature profiles due to a lack of temperature chain sensors (https://iabp.apl.uw.edu/index.html (accessed on 15 November 2020)). It should be noted that the surface temperature is measured at the bottom of the buoy's hull. If the buoy is floating, then the reported temperature is the sea surface. If the buoy is frozen in ice, or sitting on top of ice, then the reported temperature is the surface temperature of the ice. The freezing temperature of sea water is about −1.8 °C, so temperature readings below this indicate ice temperatures. In addition to surface temperature, there are other basic parameters in each buoy file, for example, the buoy ID, year, hour, min, DOY, POS_DOY (the day of the year to the decimal minute of the reported position), latitude, and longitude. As shown in Figure 1a, the IABP buoy dataset used in this study was the daily updated level-1 in the Arctic during 2011/1-2020/10 and provided records at hourly intervals. We selected 730 buoys deployed in different regions of the Arctic (North of 70° N), including 589 buoys for surface temperature and 141 buoys for air temperature.

MOSAiC
The Multidisciplinary Drifting Observatory for the Study of Arctic Climate (MOSAiC) program was a one-year-long (September 2019-October 2020) expedition in the Central Arctic (https://mosaic-expedition.org/expedition/ (accessed on 10 December 2020)). It was the largest international Arctic research initiative in history, was organized around broad science questions, and was mainly motivated by the dramatic changes in the Arctic climate system over the last few decades [45]. In order to obtain the needed observations, MOSAiC included the construction of a distribution network with its outer boundary defined by a set of nodes at the four corners of a~50 km box centered on the Central Observatory. Each node could drift with the ice floe and included a suite of multi-disciplinary, buoy-based measurements, which could be in operation for many years [45]. As shown in Figure 1b, we utilized observations from 167 buoys from 2019/10 to 2020/12, including 87 buoys for surface temperature and 80 buoys for air temperature. Both temperatures were recorded at half-hour intervals.

CRREL
The overarching goal of the Cold Regions Research and Engineering Laboratory (CRREL)-Dartmouth Mass Balance Buoy Program was to be a component of a sustainable Arctic observation network. These buoys were equipped with oceanic and atmospheric measurement systems that provided a complete vertical profile of the atmosphere, ice, and upper ocean properties (http://imb-crrel-dartmouth.org/ (accessed on 14 November 2020)). These instruments mainly recorded the hourly air temperature, mean sea level pressure, and vertical temperature profile using a set of temperature sensor chains located in the air, snow, ice, and the upper ocean [35]. Based on the temperature profile, with a 2 cm interval between adjacent sensors, it is easy to obtain the boundary positions between the air, snow, ice, and ocean. A total of 13 buoys were selected for use in this study, which were also utilized in a previous study [35]. The trajectories of these buoys during 2010/4-2016/10 are shown in Figure 1c.

Methods
As was mentioned above, the observations utilized in this study came from three different international buoy programs with various data record formats and observation parameters. For convenience, we transformed the formats of the different buoy datasets into the same format as the ERA reanalysis data. First, we processed the different buoy observations to generate data with identical information, including buoy ID, date, location, air temperature, and surface temperature. This is efficient for further extracting the buoy observation records and matching them with the ERA5 and ERA-I temperature data. Second, using the programming tools, we automatically matched and extracted the air/surface temperature and geo-location information from the buoy observations and the reanalysis data when the corresponding date and time in the two datasets were the same.
Since the ERA-I and ERA5 reanalysis datasets were gridded as 0.25 • and 0.125 • , respectively, and the observations from the buoys were collected based on single coordinates, we assigned the temperature record of the buoy observations to its corresponding grid point in the reanalysis data based on the principle of the nearest distance to generate a set of matching pairs between the buoy observations and the reanalysis data. Finally, 1,938,483 pairs from IABP, 431,874 pairs from MOSAiC, and 90,141 pairs from CRREL were extracted and analyzed in this study. January, the temperature was the lowest, and then, it gradually rose from February to June, with a continuous high temperature near 0 • C through the summer. Beginning in September, the temperature dropped significantly to around −30 • C until the end of the year, leading to a clear seasonal Ta cycle.

Evaluation of 2-m Air Temperature (Ta)
3.1.1. Long-Term Variations Figure 2a shows the time series of the daily mean Ta from the ERA5 (blue), ERA-I (red), and the buoy observations (black), averaged from all of the buoys' trajectories during 2010-2020. All three datasets exhibit similar inter-annual and seasonal cycles. During January, the temperature was the lowest, and then, it gradually rose from February to June, with a continuous high temperature near 0 °C through the summer. Beginning in September, the temperature dropped significantly to around −30 °C until the end of the year, leading to a clear seasonal Ta cycle. As shown in Figure 2b, the biases for both the ERA5 and the ERA-I were generally positive but were different in different years. For example, both biases were up to 10 °C during the summer of 2017 when the Ta values were largely above 0 °C but were mostly less than 5 °C. It is noteworthy that both reanalysis datasets exhibited cold biases (as large as −5 °C) in 2015 and 2019. The differences between the ERA-I and ERA5 exhibited clear interannual and seasonal variations. In general, the Ta from the ERA5 was 0.21 °C warmer than that from the ERA-I.
Large differences between the ERA-I and ERA5 have also been found in previous buoy comparisons, reaching more than 4 °C. Further comparisons have indicated that the biases were larger when the air temperature was less than −25 °C [35].
The biases of greater than 10 °C were checked in detail, and it was found that they mainly occurred during 2017-2018 when only 21 IABP buoys were available for these two years, and that no observations were available from CRREL (available during 2010-2016) and MOSAiC (available during 2019-2020). Therefore, the limitation caused by the number of buoys may have caused the larger biases, which indicated that a large number of buoys and cross buoy programs would increase the reliability of the comparisons. In our study, the available period for ERA-I was during 2010-2019, which stopped to update since August 2019. Whereas, ERA5 is an updated version with a longer time series of 2010-2020. The slightly different period for the ERA-I and the ERA5 may introduces biases too. As shown in Figure 2b, the biases for both the ERA5 and the ERA-I were generally positive but were different in different years. For example, both biases were up to 10 • C during the summer of 2017 when the Ta values were largely above 0 • C but were mostly less than 5 • C. It is noteworthy that both reanalysis datasets exhibited cold biases (as large as −5 • C) in 2015 and 2019. The differences between the ERA-I and ERA5 exhibited clear interannual and seasonal variations. In general, the Ta from the ERA5 was 0.21 • C warmer than that from the ERA-I.
Large differences between the ERA-I and ERA5 have also been found in previous buoy comparisons, reaching more than 4 • C. Further comparisons have indicated that the biases were larger when the air temperature was less than −25 • C [35].
The biases of greater than 10 • C were checked in detail, and it was found that they mainly occurred during 2017-2018 when only 21 IABP buoys were available for these two years, and that no observations were available from CRREL (available during 2010-2016) and MOSAiC (available during 2019-2020). Therefore, the limitation caused by the number of buoys may have caused the larger biases, which indicated that a large number of buoys and cross buoy programs would increase the reliability of the comparisons. In our study, the available period for ERA-I was during 2010-2019, which stopped to update since August 2019. Whereas, ERA5 is an updated version with a longer time series of 2010-2020. The slightly different period for the ERA-I and the ERA5 may introduces biases too. The different temporal and spatial scales of the reanalysis datasets and the buoy observations, i.e., gridded data (0.125 • and 0.25 • ) for the reanalysis and trajectory measurements for the buoys, also introduced biases into the comparisons. Figure 3a shows a scatter plot of the daily mean Ta from the ERA-I and buoy observations during 2011-2019, with 3276 matching pairs. The Ta of the ERA-I and buoy observations generally agreed well, and the correlation coefficient was 0.95. The multiyear mean Ta of the buoy observations was −13.56 ± 10.76 • C and that of the ERA-I was −11.29 ± 10.46 • C, which indicates that the ERA-I shows a warm bias of 2.27 ± 3.33 • C. Similar to Figure 3a, the scatter plot of the daily mean Ta from the ERA5 during 2010-2020 is shown in Figure 3b, with a total of 3599 matching pairs and a correlation coefficient of 0.95. The multi-year mean Ta was −12.93 ± 10.46 • C for the buoy observations and −10.59 ± 9.88 • C for the ERA5. These comparisons indicate that the ERA5 exhibited a slight warmer bias of 2.34 ± 3.22 • C than the ERA-I.
The different temporal and spatial scales of the reanalysis datasets and the buoy observations, i.e., gridded data (0.125° and 0.25°) for the reanalysis and trajectory measurements for the buoys, also introduced biases into the comparisons. Figure 3a shows a scatter plot of the daily mean Ta from the ERA-I and buoy observations during 2011-2019, with 3276 matching pairs. The Ta of the ERA-I and buoy observations generally agreed well, and the correlation coefficient was 0.95. The multi-year mean Ta of the buoy observations was −13.56 ± 10.76 °C and that of the ERA-I was −11.29 ± 10.46 °C, which indicates that the ERA-I shows a warm bias of 2.27 ± 3.33 °C. Similar to Figure 3a, the scatter plot of the daily mean Ta from the ERA5 during 2010-2020 is shown in Figure 3b, with a total of 3599 matching pairs and a correlation coefficient of 0.95. The multi-year mean Ta was −12.93 ± 10.46 °C for the buoy observations and −10.59 ± 9.88 °C for the ERA5. These comparisons indicate that the ERA5 exhibited a slight warmer bias of 2.34 ± 3.22 °C than the ERA-I. As shown in Figure 3, the Ta values from both the reanalysis datasets were mostly distributed close to the 1:1 line, and they generally exhibited warmer trends when below 0 °C. The lower the temperatures, the larger the warm biases. As shown in Table 2, if the 10 °C temperature intervals are considered, the results show that the ERA5 exhibited warmer pattern in all of the intervals, except for 0-10 °C. This is consistent with our conclusion based on the comparison of the total biases of the ERA-I and ERA5.  Figure 4 shows the monthly mean biases of the Ta between the reanalysis datasets and the buoy observations. In general, the biases were warm for both ERA-I and ERA5 in all of the months, and the largest biases occurred in April (3.02 ± 2.79 °C for ERA-I and 3.73 ± 2.84 °C for ERA5; Table 3). The biases were relatively small in August, September, October, and November, i.e., less than 2 °C. The monthly differences between the ERA5 As shown in Figure 3, the Ta values from both the reanalysis datasets were mostly distributed close to the 1:1 line, and they generally exhibited warmer trends when below 0 • C. The lower the temperatures, the larger the warm biases. As shown in Table 2, if the 10 • C temperature intervals are considered, the results show that the ERA5 exhibited warmer pattern in all of the intervals, except for 0-10 • C. This is consistent with our conclusion based on the comparison of the total biases of the ERA-I and ERA5. Table 2. Biases between the buoy observations and reanalysis datasets based on Ta temperature intervals.

Temperature Interval
Biases of ERA-I Biases of ERA5 Figure 4 shows the monthly mean biases of the Ta between the reanalysis datasets and the buoy observations. In general, the biases were warm for both ERA-I and ERA5 in all of the months, and the largest biases occurred in April (3.02 ± 2.79 • C for ERA-I and 3.73 ± 2.84 • C for ERA5; Table 3). The biases were relatively small in August, September, October, and November, i.e., less than 2 • C. The monthly differences between the ERA5 and ERA-I exhibited seasonal variations, with negative differences occurring in the warm season (July, August, and September) and positive differences occurring in the other months. The largest negative difference was −0.36 ± 0.80 • C in July, and the largest positive bias was 1.17 ± 1.79 • C in March (Table 3). and ERA-I exhibited seasonal variations, with negative differences occurring in the warm season (July, August, and September) and positive differences occurring in the other months. The largest negative difference was −0.36 ± 0.80 °C in July, and the largest positive bias was 1.17 ± 1.79 °C in March (Table 3).   Figure 5 shows the calendar diagram of the multi-year mean daily Ta bias between the buoy observations and the two reanalysis datasets and the biases between ERA5 and ERA-I. Compared with the buoy observations, both reanalysis datasets exhibited clear and overall warm biases during the entire year, with only one day showing a cold bias for ERA-I and four days showing cold biases for ERA5. For ERA-I, the largest bias was 6.23 °C, with 80.5% of the bias exceeding 2 °C, and for ERA5, the largest bias was 4.85 °C, with 79.5% of the bias exceeding 2 °C. ERA5 exhibited much warmer daily biases from January to June, especially in February and April.   Figure 5 shows the calendar diagram of the multi-year mean daily Ta bias between the buoy observations and the two reanalysis datasets and the biases between ERA5 and ERA-I. Compared with the buoy observations, both reanalysis datasets exhibited clear and overall warm biases during the entire year, with only one day showing a cold bias for ERA-I and four days showing cold biases for ERA5. For ERA-I, the largest bias was 6.23 • C, with 80.5% of the bias exceeding 2 • C, and for ERA5, the largest bias was 4.85 • C, with 79.5% of the bias exceeding 2 • C. ERA5 exhibited much warmer daily biases from January to June, especially in February and April.
Based on Table 3, both the reanalysis datasets and buoy observations reveal that the maximum monthly temperature occurred in July (1.40 ± 1.37 • C for ERA5, and 1.76 ± 1.70 • C for ERA-I, and −0.55 ± 2.58 • C for buoy). The smallest monthly bias was 1.36 ± 2.51 • C in September for ERA5 and 1.72 ± 2.62 • C in October for ERA-I. Otherwise, the largest bias for ERA5 occurred in April (3.73 ± 2.84 • C), and that of the ERA-I-buoy occurred in July (3.12 ± 2.44 • C). For both reanalysis methods, the root mean square error (RMSE) values were nearly double in January, February, and March, which indicates larger bias variations in these rather cold months compared with other months. Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 19 Based on Table 3, both the reanalysis datasets and buoy observations reveal that the maximum monthly temperature occurred in July (1.40 ± 1.37 °C for ERA5, and 1.76 ± 1.70 °C for ERA-I, and −0.55 ± 2.58 °C for buoy). The smallest monthly bias was 1.36 ± 2.51 °C in September for ERA5 and 1.72 ± 2.62 °C in October for ERA-I. Otherwise, the largest bias for ERA5 occurred in April (3.73 ± 2.84 °C), and that of the ERA-I-buoy occurred in July (3.12 ± 2.44 °C). For both reanalysis methods, the root mean square error (RMSE) values were nearly double in January, February, and March, which indicates larger bias variations in these rather cold months compared with other months.
Our assessments of the 2-m air temperature are highly consistent with the results of a previous study, that is, 0.75-4.4 °C and 0.75-8.0 °C warm biases for ERA-I and ERA5, respectively, based on 13 CRREL buoys in the Arctic during 2010-2016 [35]. In addition, other previous studies have shown that the warm biases in the reanalysis data widely occurred in polar regions under cold conditions when compared with other measurements, including buoys [40,41,43,[46][47][48].
Our results are comparable to those of the previous studies shown in Table 1. Numerous comparisons of the 2-m air temperature of the ERA-I have been carried out over the Arctic and in the high mountain regions in the Northern Hemisphere. The results show that warm biases occurred in all of the studies. For the 2-m air temperature of the ERA5, a study in 2019 compared it with in-situ observations from 16 buoys in the Arctic and found a 0-8 °C warm bias, which is consistent with our result (2.34 ± 3.22 °C). However, we used hundreds of buoys (total 234 buoys) compared to the 16 buoys used in the previous study, indicating that our comparisons are more reliable.
The accuracy of the assessment of the 2-m air temperature may be related to two factors. First, it is affected by the height of the air temperature sensor on the buoy, which is usually lower than 2 m for the buoys. Another reason is the poorly modelled near-surface inversion of the reanalysis datasets [40,41,49], which may be connected with the strong turbulent diffusivity in the boundary layer under stable conditions [49], or the simple representation of the sea ice thickness as 1.5 m and the snow-free assumption over the sea ice in the Arctic [35,47].  Our assessments of the 2-m air temperature are highly consistent with the results of a previous study, that is, 0.75-4.4 • C and 0.75-8.0 • C warm biases for ERA-I and ERA5, respectively, based on 13 CRREL buoys in the Arctic during 2010-2016 [35]. In addition, other previous studies have shown that the warm biases in the reanalysis data widely occurred in polar regions under cold conditions when compared with other measurements, including buoys [40,41,43,[46][47][48].

Evaluation of Surface
Our results are comparable to those of the previous studies shown in Table 1. Numerous comparisons of the 2-m air temperature of the ERA-I have been carried out over the Arctic and in the high mountain regions in the Northern Hemisphere. The results show that warm biases occurred in all of the studies. For the 2-m air temperature of the ERA5, a study in 2019 compared it with in-situ observations from 16 buoys in the Arctic and found a 0-8 • C warm bias, which is consistent with our result (2.34 ± 3.22 • C). However, we used hundreds of buoys (total 234 buoys) compared to the 16 buoys used in the previous study, indicating that our comparisons are more reliable.
The accuracy of the assessment of the 2-m air temperature may be related to two factors. First, it is affected by the height of the air temperature sensor on the buoy, which is usually lower than 2 m for the buoys. Another reason is the poorly modelled near-surface inversion of the reanalysis datasets [40,41,49], which may be connected with the strong turbulent diffusivity in the boundary layer under stable conditions [49], or the simple representation of the sea ice thickness as 1.5 m and the snow-free assumption over the sea ice in the Arctic [35,47]. Figure 6a clearly depicts the generally good agreement between the annual Ts cycles from ERA5 (blue), ERA-I (red), and the buoy observations (black). The daily mean Ts for both the ERA5 and ERA-I ranged from −35 • C in winter to nearly 5 • C in summer during 2011-2020. Compared with the buoy observations, both reanalysis methods exhibited significant cold biases, except for ERA5 and ERA-I in 2013 (Figure 6b). The biases exhibited clear seasonal variations, with some larger than −10 • C in winter and close to 0 • C in summer. The differences between ERA5 and ERA-I changed significantly during different years. In particular, they were greater than 5 • C in 2013, 2017, 2018, and 2019. 2011-2020. Compared with the buoy observations, both reanalysis methods exhibited significant cold biases, except for ERA5 and ERA-I in 2013 (Figure 6b). The biases exhibited clear seasonal variations, with some larger than −10 °C in winter and close to 0 °C in summer. The differences between ERA5 and ERA-I changed significantly during different years. In particular, they were greater than 5 °C in 2013, 2017, 2018, and 2019. A total of 3060 and 3487 matching pairs for ERA-I and ERA5, respectively, were used for the comparisons of the daily mean Ts (Figure 7a,b). Based on the scatter plots, both reanalysis methods exhibited significant cold biases, and the biases were larger when the surface temperature was low. Furthermore, as shown in Table 4, if the 10 °C temperature intervals are considered, the ERA-I was closer to the buoy observations in the cold environments (lower than −20 °C), while the ERA5 performed better in the −10-0 °C and 0-10 °C intervals.  A total of 3060 and 3487 matching pairs for ERA-I and ERA5, respectively, were used for the comparisons of the daily mean Ts (Figure 7a,b). Based on the scatter plots, both reanalysis methods exhibited significant cold biases, and the biases were larger when the surface temperature was low. Furthermore, as shown in Table 4, if the 10 • C temperature intervals are considered, the ERA-I was closer to the buoy observations in the cold environments (lower than −20 • C), while the ERA5 performed better in the −10-0 • C and 0-10 • C intervals. 2011-2020. Compared with the buoy observations, both reanalysis methods exhibited nificant cold biases, except for ERA5 and ERA-I in 2013 (Figure 6b). The biases exhib clear seasonal variations, with some larger than −10 °C in winter and close to 0 °C in s mer. The differences between ERA5 and ERA-I changed significantly during diffe years. In particular, they were greater than 5 °C in 2013, 2017, 2018, and 2019. A total of 3060 and 3487 matching pairs for ERA-I and ERA5, respectively, were u for the comparisons of the daily mean Ts (Figure 7a,b). Based on the scatter plots, b reanalysis methods exhibited significant cold biases, and the biases were larger when surface temperature was low. Furthermore, as shown in Table 4, if the 10 °C tempera intervals are considered, the ERA-I was closer to the buoy observations in the cold e ronments (lower than −20 °C), while the ERA5 performed better in the −10-0 °C and 0 °C intervals.    Figure 8 shows the seasonal changes in the biases between the ERA5, ERA-I, and buoy observations. Both the ERA5 and ERA-I exhibited cold biases for all of the months when compared with the buoy observations. The Ts from ERA5 exhibited warmer biases for most of the period, except for March and April, when compared with ERA-I. The cold biases (greater than −6 • C) usually occurred in the cold season (November, December, January, and February), and the largest bias occurred in December, exceeding −10 • C for both reanalysis datasets. However, in the warm summer, the biases were usually rather small, less than −3 • C from April to October.   Figure 8 shows the seasonal changes in the biases between the ERA5, ERA-I, and buoy observations. Both the ERA5 and ERA-I exhibited cold biases for all of the months when compared with the buoy observations. The Ts from ERA5 exhibited warmer biases for most of the period, except for March and April, when compared with ERA-I. The cold biases (greater than −6 °C) usually occurred in the cold season (November, December, January, and February), and the largest bias occurred in December, exceeding −10 °C for both reanalysis datasets. However, in the warm summer, the biases were usually rather small, less than −3 °C from April to October.  Figure 9 shows a calendar diagram of the multi-year mean daily Ts biases for ERA5 and ERA-I. Compared with the buoy observations, there was a notable cold bias for the entire year, except for 8 days in May for ERA-I and 18 days during May and June for ERA5. For ERA-I, the maximum bias was −10.81 °C, and 46% of the biases of the time series were greater than −4 °C. For ERA5, the maximum bias was −10.67 °C, and the proportion of the biases greater than −4 °C was also 46%. In addition, the seasonal cycles of the daily cold bias were obvious for both reanalysis datasets, which illustrates the relatively larger bias during the cold seasons and the relatively smaller bias during the warm seasons.  As shown in Table 5, the Ts from both reanalysis datasets exhibited seasonal cycles, and the lowest monthly mean temperatures occurred in January, reaching −24.06 °C for ERA5 and −25.26 °C for ERA-I. In July, the mean monthly temperature increased to the maximum and exceeded 0 °C for both reanalysis datasets. The monthly differences between the ERA5 and ERA-I were small, from −0.54 °C to 1.49 °C. The largest difference was 1.49 ± 2.26 °C in November and the smallest was −0.08 ± 2.98 °C in April. The Ts values from ERA5 were higher than those from ERA-I during most of the months, except for March and April. Compared with the buoy observations, the biases in all the months were negative, ranging −0.54 to −9.90 °C for ERA5 and from −1.29 to −11.18 °C for ERA-I, and they were larger in the cold season than in the warm season. The RMSE distribution also illustrates the seasonal variations, which were about five to ten times larger in the cold seasons (maximum of 10.33 °C for ERA5 and 11.59 °C for ERA-I) compared with the warm seasons (minimum of 1.48 °C for ERA5 and 2.01 °C for ERA-I). As shown in Table 5, the Ts from both reanalysis datasets exhibited seasonal cycles, and the lowest monthly mean temperatures occurred in January, reaching −24.06 • C for ERA5 and −25.26 • C for ERA-I. In July, the mean monthly temperature increased to the maximum and exceeded 0 • C for both reanalysis datasets. The monthly differences between the ERA5 and ERA-I were small, from −0.54 • C to 1.49 • C. The largest difference was 1.49 ± 2.26 • C in November and the smallest was −0.08 ± 2.98 • C in April. The Ts values from ERA5 were higher than those from ERA-I during most of the months, except for March and April. Compared with the buoy observations, the biases in all the months were negative, ranging −0.54 to −9.90 • C for ERA5 and from −1.29 to −11.18 • C for ERA-I, and they were larger in the cold season than in the warm season. The RMSE distribution also illustrates the seasonal variations, which were about five to ten times larger in the cold seasons (maximum of 10.33 • C for ERA5 and 11.59 • C for ERA-I) compared with the warm seasons (minimum of 1.48 • C for ERA5 and 2.01 • C for ERA-I).

Comparisons of Ta and Ts
Basically, Ta and Ts exhibited similar and reasonable seasonal cycles for both ERA5 and ERA-I. As shown in Figure 10a, b, the differences between Ta and Ts were small during most of the time series, except for in the spring of 2013 and the summer of 2017. The maximum daily differences in Ta and Ts for ERA5 were 15.05 • C and −15.95 • C, respectively, while for ERA-I they were 22.81 • C and −18.14 • C during the entire time series.

Comparisons of Ta and Ts
Basically, Ta and Ts exhibited similar and reasonable seasonal cycles for both ERA5 and ERA-I. As shown in Figure 10a, b, the differences between Ta and Ts were small during most of the time series, except for in the spring of 2013 and the summer of 2017. The maximum daily differences in Ta and Ts for ERA5 were 15.05 °C and −15.95 °C, respectively, while for ERA-I they were 22.81 °C and −18.14 °C during the entire time series. When compared with the buoy observations (Figure 10c), Ta and Ts both exhibited similar patterns with varying amplitudes in both summer and winter. Overall, Ts was generally warmer than Ta, and the differences were as large as 10 °C in winter and 5 °C in summer. The reason for the warmer Ts values may be the monitoring location of the surface temperature sensor on the buoy, which is usually located at the bottom of the IABP buoy's hull. When the buoy is frozen into the sea ice, the values recorded by the surface temperature sensor are in fact the ice's inner temperature, which is warmer than Ta because of the thermal insulation effect of the snow cover and the sea ice. Due to this, the Ts recorded by the buoys may overestimate the surface temperature, which may be responsible for the large cold Ts bias from both the ERA5 and ERA-I. Therefore, the cold Ts biases evaluated in Section 3.2.2 may be overestimated.

Impact of Snow on Ice Surface Temperature
For a typical IABP buoy, the radius of the operation sphere is 35 cm, and the surface temperature sensor is located near the bottom of the buoy (Figure 11a). When the buoys When compared with the buoy observations (Figure 10c), Ta and Ts both exhibited similar patterns with varying amplitudes in both summer and winter. Overall, Ts was generally warmer than Ta, and the differences were as large as 10 • C in winter and 5 • C in summer. The reason for the warmer Ts values may be the monitoring location of the surface temperature sensor on the buoy, which is usually located at the bottom of the IABP buoy's hull. When the buoy is frozen into the sea ice, the values recorded by the surface temperature sensor are in fact the ice's inner temperature, which is warmer than Ta because of the thermal insulation effect of the snow cover and the sea ice. Due to this, the Ts recorded by the buoys may overestimate the surface temperature, which may be responsible for the large cold Ts bias from both the ERA5 and ERA-I. Therefore, the cold Ts biases evaluated in Section 3.2.2 may be overestimated.

Impact of Snow on Ice Surface Temperature
For a typical IABP buoy, the radius of the operation sphere is 35 cm, and the surface temperature sensor is located near the bottom of the buoy (Figure 11a). When the buoys were distributed over sea ice in the Arctic, the surface temperature sensor may be frozen into the ice surface layer or even covered by the snow layer. Usually the surface temperature did not measure the inner ice temperature exactly 35 cm below ice surface because the ice stress can arch the buoy up and make the surface temperature sensor closer to the ice surface (Figure 11b,c). Therefore, the maximum depth of the sensor was no more than 35 cm, but the real depth was hard to determine. When the ice surface was snow free, the measured temperature was regarded as the ice surface temperature. However, when snow was present, the measured temperature was affected by the large heat insulation effect of the snow.
were distributed over sea ice in the Arctic, the surface temperature sensor may be frozen into the ice surface layer or even covered by the snow layer. Usually the surface temperature did not measure the inner ice temperature exactly 35 cm below ice surface because the ice stress can arch the buoy up and make the surface temperature sensor closer to the ice surface (Figure 11b, c). Therefore, the maximum depth of the sensor was no more than 35 cm, but the real depth was hard to determine. When the ice surface was snow free, the measured temperature was regarded as the ice surface temperature. However, when snow was present, the measured temperature was affected by the large heat insulation effect of the snow. The snow simplification issues in the ERA-I and ERA5 parameterization also caused biases (reported to be 5-10 °C), and simplified consistent snow and ice parameters were applied in the assimilation model, for example, setting the sea ice thickness to 1.5 m and the snow thickness to zero [46]. Due to the lack of real snow thickness measurements and products over the Arctic sea ice, it was hard to quantify the snow effect on the temperature measurements along the buoy drift trajectories; therefore, we only discuss the snow's effect for different snow thickness ranges under different air temperature conditions.
If we assume the thermal conductivity of snow to be 0.31 W/m K [51], the typical temperature gradient of snow would be 0.1-0.5 °C/cm [52]. We used mean gradients (0.2 °C/cm, 0.35 °C/cm, 0.5 °C/cm, and 0.65 °C/cm for the air temperature cases of −10 °C, −20 °C, −30 °C, and −40 °C, respectively) to estimate the possible temperature effect caused by the snow cover. If snow covered on the ice surface was 5 cm thick, then the biases of ice surface temperature between the reanalyzes and the IABP measurements were overestimated by1 °C for the −10 °C case and 3.25 °C for the −40 °C case. The same calculations were applied to other cases and the detailed results are presented in Table 6. As we can conclude from Table 6, the lower the temperature and the deeper the snow thickness, the bigger the overestimation caused by snow cover.  The snow simplification issues in the ERA-I and ERA5 parameterization also caused biases (reported to be 5-10 • C), and simplified consistent snow and ice parameters were applied in the assimilation model, for example, setting the sea ice thickness to 1.5 m and the snow thickness to zero [46]. Due to the lack of real snow thickness measurements and products over the Arctic sea ice, it was hard to quantify the snow effect on the temperature measurements along the buoy drift trajectories; therefore, we only discuss the snow's effect for different snow thickness ranges under different air temperature conditions.
If we assume the thermal conductivity of snow to be 0.31 W/m K [51], the typical temperature gradient of snow would be 0.1-0.5 • C/cm [52]. We used mean gradients (0.2 • C/cm, 0.35 • C/cm, 0.5 • C/cm, and 0.65 • C/cm for the air temperature cases of −10 • C, −20 • C, −30 • C, and −40 • C, respectively) to estimate the possible temperature effect caused by the snow cover. If snow covered on the ice surface was 5 cm thick, then the biases of ice surface temperature between the reanalyzes and the IABP measurements were overestimated by1 • C for the −10 • C case and 3.25 • C for the −40 • C case. The same calculations were applied to other cases and the detailed results are presented in Table 6. As we can conclude from Table 6, the lower the temperature and the deeper the snow thickness, the bigger the overestimation caused by snow cover.

Bias Variations in the Different SIC Ranges
The buoys drifted in the different SIC (sea ice concentration) regions due to the growth or retreat of the sea ice extent and the ocean currents. The different SICs had different percentages of water and ice, which dominated the heat exchange between the atmosphere and ocean. Therefore, the assessment of the Ta and Ts may be significantly affected by the SIC, especially when a large grid cell of 12.5-25 km was used in the analysis datasets. For Ta (Figure 12a), 11-19% of the buoys were distributed in regions with SICs ranging from 0 to 20%, which is regarded as open water. In addition, 76-86% of the buoys drifted in areas with SICs of 80-100%, which is regarded as frozen on ice. These two situations accounted for more than 95% of the buoy positions in this study. It is obvious that the biases of the Ta became larger when the buoy locations had SICs of 20-80% because large uncertainties existed when the surface type within a single grid cell was not homogeneous but was a mixture of water and ice.

Bias Variations in the Different SIC Ranges
The buoys drifted in the different SIC (sea ice concentration) regions due to the growth or retreat of the sea ice extent and the ocean currents. The different SICs had different percentages of water and ice, which dominated the heat exchange between the atmosphere and ocean. Therefore, the assessment of the Ta and Ts may be significantly affected by the SIC, especially when a large grid cell of 12.5-25 km was used in the analysis datasets. For Ta (Figure 12a), 11-19% of the buoys were distributed in regions with SICs ranging from 0 to 20%, which is regarded as open water. In addition, 76-86% of the buoys drifted in areas with SICs of 80-100%, which is regarded as frozen on ice. These two situations accounted for more than 95% of the buoy positions in this study. It is obvious that the biases of the Ta became larger when the buoy locations had SICs of 20-80% because large uncertainties existed when the surface type within a single grid cell was not homogeneous but was a mixture of water and ice. For the Ts from the ERA-I and ERA5 (Figure 12b), the distribution of the biases relative to the different SIC values was significantly larger in the high SIC range (80-100%), in which the buoys are believed to be frozen in the ice. Therefore, the larger biases may be related to the snow cover on the ice. As we discussed above, the snow cover would significantly increase the ice surface temperature because of its large heat insulation effect. Therefore, the surface temperature was sensitive to the surface conditions in the Arctic. For the Ts from the ERA-I and ERA5 (Figure 12b), the distribution of the biases relative to the different SIC values was significantly larger in the high SIC range (80-100%), in which the buoys are believed to be frozen in the ice. Therefore, the larger biases may be related to the snow cover on the ice. As we discussed above, the snow cover would significantly increase the ice surface temperature because of its large heat insulation effect. Therefore, the surface temperature was sensitive to the surface conditions in the Arctic.

Bias Variations between Different Buoy Projects
For the ERA5 data (Table 7), the total number of matching pairs was 778,331 for the Ta evaluation, among which 68% were for IABP, 22% were for MOSAiC, and 10% were for CRREL. However, the total number of matching pairs was 1,610,662 for the Ts evaluation, among which 84% were for IABP and 16% were for CRREL. Therefore, the IABP was the main source of the buoy observations used in this study. The bias of Ta between the reanalysis and buoy observations was the smallest for MOSAiC (0.12 ± 1.76 • C), followed by CRREL (2.25 ± 2.49 • C), and was the largest for IABP (3.27 ± 4.69 • C). The RMSE values follow the same sequence, i.e., 1.76 • C, 3.36 • C, and 5.72 • C for MOSAiC, CRREL, and IABP, respectively. The evaluated Ts bias was −3.33 ± 3.08 • C for MOSAiC and −4.43 ± 4.27 • C for IABP. The smallest bias and RMSE values for both the Ta and Ts of the ERA5 were for MOSAiC. In addition to having fewer matching pairs than IABP, the reason for this may also be related to the improved equipment and sensors compared with the other two buoy projects, which were conducted in recent years (2019-2020). For the ERA-I Ta evaluation (Table 8), the total number of matching pairs was 71,505, among which 82% were for IABP and 18% were for CRREL. The total number of matching pairs was 160,071 for the Ts evaluation. Generally, the number of buoy observations used for the ERA-I was ten times less than the number used for the ERA5. The Ta from the ERA-I exhibited a warm bias of 1.05 ± 3.60 • C for IABP and 1.60 ± 2.83 • C for CRREL. The RMSEs were 3.75 • C and 3.25 • C for IABP and CRREL, respectively.

Conclusions
When monitoring the rapid climate warming in the Arctic, in-situ measurements, remote sensing products, and reanalysis datasets are the primary datasets used by the scientific community. The in-situ measurements, including drifting buoys, offer the most accurate atmospheric and oceanic data, but are limited by their sparse distributions. Reanalysis datasets are of great significance to filling the unobserved data gaps on temporal and spatial scales, but they need to be validated before being used for further research. In this study, we evaluated the performance of the 2-m air temperature (Ta) and the surface temperature (Ts) in the Arctic produced from the ERA-I and ERA5 reanalysis datasets based on in-situ observations from 910 drifting buoys monitored by three international buoy programs during 2010-2020. The buoy observations used in this study covered the largest area with the longest period ever reported in research related to the Arctic.
First, the Ta from the reanalysis datasets exhibited similar seasonal cycles and high correlation coefficients of 0.95 when compared with the buoy observations. The Ta from the ERA-I exhibited a warm bias of 2.27 ± 3.33 • C, while the Ta from the ERA5 exhibited a warmer bias of 2.34 ± 3.22 • C based on the comparisons of more than 3000 daily matching pairs. The warm Ta biases exhibited monthly variations with the maximum occurring in April. However, the smaller warm biases occurred in the warm season (June, July, and August). Negative biases were found to occur on only one day for ERA-I and four days for ERA5 in multi-year mean daily biases; therefore, warm biases occurred throughout most of the year. Second, the Ts from ERA-I and ERA5 both exhibited similar variation patterns, but the cold bias from ERA-I was larger, especially in winter. The largest cold biases of the ERA-I and ERA5 were up to −10 • C for every winter during 2011-2020. Generally, the cold biases were larger when the temperature was lower. For the monthly comparison, the largest biases occurred in December. However, the biases were normally less than −3 • C during April-October. The warmer Ts may be related to the location of the surface temperature sensor on the buoy, which is located near the bottom of the IABP buoy's hull and may be affected by the snow cover. Based on this point, the cold Ts biases of the ERA-I and ERA5 are probably overestimated.
The small quantity of buoys may have caused the larger biases during 2016-2018, which indicated that enough buoys and cross buoy programs would increase the reliability of validations. The different temporal and spatial scales of the reanalysis datasets and buoy observations also introduced biases into the comparisons. In the future, more unmanned buoys and improved sensor technology will enhance the span and accuracy of the insitu observations. The combination of enough observations and improved assimilation methods will improve the quality of the reanalysis datasets.