#### 3.1. Temporal Characteristics of Model Historical Simulations and Projections

The temporal distributions of Arctic SIEs from historical simulations (hist) and projections (for r45 and r85) from all 12 models are shown in

Figure 1,

Figure 2 and

Figure 3, respectively, while the time series of March and September Arctic SIEs, which usually represent the annual maximum and minimum sea ice cover, are presented in

Figure 4. Though variability in the monthly SIE values can be seen, all 12 models depicted distinct seasonal and interannual SIE cycles in all three experimental cases (

Figure 1,

Figure 2 and

Figure 3). The dark gray contour lines in

Figure 2 and

Figure 3 denote the isoline of 1 × 10

^{6} km

^{2}, representing the nearly ice-free Arctic state. The timing of projected nearly ice-free Arctic is examined in more detail in

Section 3.4.

A noticeable reduction of the mean Arctic SIE maximum, represented by the March Arctic SIE, could be seen based on model projections, with that of r85 being more pronounced near the end of the 21st century (

Figure 4). The March Arctic SIE projections for r85 were more spread out at the end of this century, with one model (MICRO-ESM-CHEM) actually projecting the maximum SIE value as approaching the Arctic ice-free threshold at 1.59 × 10

^{6} km

^{2}, a value significantly smaller than those projected by other models. Giving the current and foreseeable future Arctic conditions, March SIE values are not likely to be this low. Further analysis needs to be carried out to find out the reason behind this projection. However, this analysis is out of the scope of current study.

The depletion of the Arctic SIE minimum, represented by the September SIE, was captured by the model projections, with a faster rate of SIE decline in the r85 case (

Figure 4). All the models projected the Arctic to be ice free by 2065 for the high emission scenario (see

Section 3.4 for more details).

While the mean of the Arctic SIE maximums of the hist runs agreed well with the observations, the mean of the Arctic SIE minimums of the hist runs was systematically underestimated compared to the observations (

Figure 4).

#### 3.3. Evaluation of Climate Model Simulations and Projections

The time series of September Arctic SIEs are shown in

Figure 5 for hist, r45, and r85. Visually, the ability of these 12 global climate models to simulate and/or project the observed interannual variability varied considerably (

Figure 5). The HadGEM2-AO model noticeably and systematically underestimated the September SIEs in all three cases, while HadGEM2-ES and Micro-ESM did so in their historical simulations (

Figure 5a–c). On the other hand, the EC-EARTH model tended to systematically overestimate the September SIEs (

Figure 5a), and more so for the projections (

Figure 5b,c).

The evaluation of the model simulations and projections (for both the RCP4.5 and RCP8.5 scenarios) was carried out by examining the bias, root mean square error (RMSE), and mean absolute error (MAE) of the model SIEs from the observed SIEs during the overlapping periods.

The model bias was an average of the differences between modeled and observed SIEs that included both positive and negative errors that cancelled each other out. MAE, on the other hand, captures the average of the absolute differences between the modeled and observed SIEs, while the RMSE is the square root of the average of squared differences between the modeled and observed SIEs. Though the RMSE is commonly used for measuring model errors, Willmott et al. [

26] argued that it may overestimate the average model error, and they recommended using the MAE to help overcome such error overestimation. We therefore decided to include all three statistical metrics in our analysis. Mathematically, they are defined as:

where

$\hat{SIE}$ denotes the observed SIE time series and

N is the total number of records within the overlapping period.

The numerical values of SIE model bias, RMSE, and MAE are depicted in

Table 2,

Table 3 and

Table 4, respectively. For each parameter, two different types of diagrams are shown for model bias and RSME (

Figure 6 and

Figure 7).

Figure 6a and

Figure 7a group the results as a function of the experiment types, namely, hist, r45, and r85. They show the spread of the statistical parameters of all 12 models for each experiment type and allow for comparison among different experiment types.

Figure 6b and

Figure 7b group the results as a function of individual models. They show the spread of the statistical parameters across the different experiments for a given model and allow for comparison among models.

For the historical simulations, HadGEM2-CC yielded the smallest bias magnitude at −0.1166 (10

^{6} km

^{2}). MPI-ESM-LR was the next closest at 0.2177 (10

^{6} km

^{2}), while ACCESS10 ranked as third smallest at 0.2878 (10

^{6} km

^{2}). At the same time, MIROC-ESM had the largest bias magnitude at −2.1654 (10

^{6} km

^{2}). Eight out of 12 models had negative bias values, with an average model bias value of −0.372 (10

^{6} km

^{2}) (

Table 2). The bias values of the projections were more evenly spread in both positive and negative domains, with an average value of −0.083 (10

^{6} km

^{2}) for r45 and zero for r85 (

Table 3 and

Table 4 and

Figure 6). There was no obvious systematic change for different experiment types for each model, and HadGEM2-CC appeared to have a smaller bias overall than other climate models (

Figure 6b).

MPI-ESM-MR yielded the lowest overall RMSE values for all three cases, with HadGEM2-CC being the second lowest (

Table 2,

Table 3 and

Table 4 and

Figure 7). MIROC-ESM had the highest RMSE value for the historical simulation case, while HadGEM2-AO and CESM1-CAM5 had the highest RMSEs for the other two projection cases (

Figure 7; see also

Table 2,

Table 3 and

Table 4). The overall spread of all models was smaller in the r85 case (

Figure 7a). Similar results were found for the MAE (

Table 2,

Table 3 and

Table 4; Figure not shown).

#### 3.4. First Ice-Free Arctic Summer Year (FIASY)

As discussed earlier, the first occurrence of an ice-free Arctic summer is defined as the first year when the projected summer SIE falls below 1 × 10

^{6} km

^{2}. The FIASY values were identified based on the model projected SIE values. The results are summarized in

Table 5.

The earliest projected FIASY values for the RCP4.5 scenario were 2023, 2025, and 2026 from the MIROC-ESM, HadGEM2-AO, and MIROC-ESM-CHEM models, respectively. The earliest projected FIASY for RCP8.5 was 2023 for HadGEM2–AO. Except for MIROC-ESM and MIROC-ESM-CHEM, the models tended to project earlier FIASYs for RCP8.5 than that for RCP4.5 (

Table 5). As discussed in the Introduction, it is unlikely that the Arctic summer will be nearly ice free (i.e., a SIE is less than 1.0 × 10

^{6} km

^{2}) by the early 2020s.

The latest FIASY for RCP8.5 was 2065 by EC-EARTH. The CCMS4 model produced the second latest FIASY for RCP8.5, with a value of 2064, and it never reached the ice-free stage by 2100 for RCP4.5. The MPI-ESM-LR and MPI-ESM-MR models tended to produce FIASY values on the high end in the RCP4.5 runs, while they were closer to the mean in the RCP8.5 runs.

Excluding the values later than 2100, the mean FIASY value for RCP4.5 was 2054 with a spread of 74 years; for RCP8.5, the mean FIASY was 2042 with a spread of 42 years (

Table 5). The earliest projected FIASY was at year 2023 for both scenarios, but the latest FIASY value was reduced from later than 2100 for RCP4.5 to 2065 for RCP8.5. Therefore, the mean of the projected FIASYs were earlier and the range was tighter for the RCP8.5 compared to the RCP4.5 scenario. Given the fact that the FIASY value from CCSM4 was later than 2100 for the RCP4.5 scenario, the mean FIASY value for the RCP4.5 scenario was later than 2054; the spread was larger than 77.

The sensitivity of individual models to these two different scenarios, however, greatly varied. For example, the absolute difference was 48 years for M11 (MPI-ESM-LR) but about two years for M6 (HadGEM2-AO) (

Table 5). Both HadGEM2-AO and MICRO-ESM consistently yielded earlier FIASY values in both scenarios compared to others. When the differences were less than 10 years between the two scenarios, nearly all the projected FIASYs were before 2050, an indication that before mid-century, the choice of scenario was less important for the future Arctic sea ice projections, as stated by Overland et al. [

27].

#### 3.5. Sensitivity of Different Statistical Curve-Fitting Functions

Peng et al. [

4] used six commonly used statistical models to curve-fit satellite observational data in order to examine the nature and sensitivity of Arctic sea ice extent trends. In this section, the same approach is utilized to examine the nature of climate model SIE projections and the sensitivity of the FIASY projections to scenarios and models.

As pointed out in [

4], the probability that the statistical curving-fitting model is the best of the sample can be presented by the W-Akaike weights, denoted as the optimized statistical curve-fitting model. The W-Akaike weights were computed by using climate model historical simulation and projections for the following four periods and captured in

Tables S1 and S2: (i) 1979–2008, (ii) 1979–2017, (iii) 1988–2017, and (vi) 2006–2035. The first three periods corresponded to different subsets of the satellite data record (the first 30 years, the whole period, and the last 30 years, respectively), while the fourth period corresponded to the first 30 years of the climate model projections. The weights were computed for both RCP4.5 and RCP8.5 scenarios.

Figure 8 (top panel) contains all the FIASY values predicted by the six statistical models (denoted by different shapes) that were curve-fitted with the SIEs values for each of the four different periods from each of the 12 climate models and color-coded for the RCP4.5 and 8.5 scenarios. The bottom panel shows the counts of two-year bins from 2010 to 2100 based on all FIASY values within each of the four periods.

There were no obviously dominant FIASY peaks except for the one in the last period (2006–2035, the first 30 years of model projections). Generally speaking, the distribution of the FIASY values for the first period (1979–2008), which were mostly based on historical simulations, was fairly broad but tended to be slightly earlier, ranging from 2015 to 2065. For the second period (1979–2017), the distribution was fairly broad and evenly distributed from 2020 to 2075, with no obvious coherence between the RCP4.5 and RCP8.5 scenarios. The coherence between the two scenarios was slightly improved for the third period (1988–2017). For this case, there were two preferred time frames for predicted FIASY values: One was around year 2020, and the other was around year 2080. This is very different from the results found by Peng et al. (2018), where the predicted FIASY values converged to 2037. Again, since the most recent observed annual Arctic sea ice minimum was at 4.15 (10^{6} km^{2}) on 18 September 2019, the prospect of an ice-free Arctic summer in or around 2020 is not realistic.

A more distinct peak at 2034 for both scenarios could be seen in the last considered period (2006–2035); the overall distributions of FIASYs for both scenarios were also more consistent than those of the other three periods.

For all the considered cases, the model SIE simulations and projections were shown to be unlikely to be logarithmic in nature, while the likelihood of being either linear or quadratic was fairly close for the first two periods (

Figure 9; see also

Tables S1 and S2). The Gompertz statistical model tended to predict earlier FIASYs than is realistic (earlier than 2019 for these two periods); the same can be said for the exponential statistical model for the first period, which gave FIASY values at or earlier than 2019 (

Figure 9; see also

Tables S3a,b and S4a,b). The linear curve-fitting model seemed to be the preferable choice, more so for the last two periods for the r45 scenario (

Figure 9; see also

Table S1c,d). The preference for being linear was slightly weaker for all four periods in the r85 scenario, where the increase was more quadratic (

Figure 9; see also

Table S2c,d).

By using the range of the FIASY values estimated from the projections shown in the previous subsection (shaded bars in the top panel of

Figure 9), we could assess the feasibility of the FIASY values predicted by the optimal statistical models. For the first 30 years of the climate model projections (2006–2035), the FIASY values predicted by the optimal statistical model fell within the range of projected ranges for nine out of 11 climate models in the RCP8.5 scenario (one with the FIASY value later than year 2100); however, this only occurred for two climate models in the RCP 4.5 scenario (

Figure 9, top panel). In contrast, six out of 10 models fell within the projected ranges in the RCP4.5 scenario for the last 30 years of the satellite data (1988–2017), but only four out of eight did so in the RCP8.5 scenario. With just two climate models from either scenario falling within projected ranges, the whole satellite record period of 1979–2017 turned out to have the least predictive skill for all six statistical models. For the first 30 year period (1979–2008) of satellite observations, four models fell in projected ranges for the RCP8.5 scenario, but only one did so for the RCP4.5 scenario.