Seasonal Prediction of Arctic Summer Sea Ice Concentration from a Partial Least Squares Regression Model

: The past decade has witnessed a rapid decline in the Arctic sea ice and therefore has raised a rising demand for sea ice forecasts. In this study, based on an analysis of long-term Arctic summer sea ice concentration (SIC) and global sea surface temperature (SST) datasets, a physical–empirical (PE) partial least squares regression (PLSR) model is presented in order to predict the summer SIC variability around the key areas of the Arctic shipping route. First, the main SST modes closely associated with sea ice anomalies are found by the PLSR method. Then, a prediction model is reasonably established on the basis of these PLSR modes. We investigate the performance of the PE PLSR model by examining its reproducibility of the seasonal SIC variability. Results show that the proposed model turns out promising prediction reliability and accuracy for Arctic summer SIC change, thus providing a reference for the further study of Arctic SIC variability and global climate change. and the route of the Arctic shipping route, this study investigates the spatial and temporal distribution of sea ice in the following key sea areas of the Arctic shipping route, including (1) the Barents Sea and the Kara Sea in the Northeast Channel; (2) the East Siberian Sea and the Laptev Sea in the Northeast Passage; and (3) the Beaufort Sea and the Canadian Arctic Archipelago in the Northwest Passage. These areas nearly cover the entire Arctic waterway. Studying the characteristics


Introduction
The sharp melting of Arctic summer sea ice is a striking indicator of global climate change [1][2][3]. Observations show an accelerating thinning and reduction in Arctic summer sea ice concentration (SIC) during recent decades [4][5][6], with large decadal and seasonal variability [7].
Previous research showed that changes in sea surface temperature (SST) and largescale atmospheric circulations can contribute to the Arctic sea ice depletion [8,9] due to enhanced atmospheric and oceanic heat transports [10][11][12]. Research also found that the North Atlantic Oscillation (NAO) or Arctic Oscillation (AO) may play a critical role [13][14][15]. In addition, the multi-decadal variability in SST, characterized by the Pacific Decadal Oscillation (PDO) in the North Pacific Ocean and the Atlantic Multidecadal Oscillation (AMO) in the North Atlantic Ocean, has also been confirmed to have an influence on the Arctic sea ice retreat [16][17][18].
It can undoubtedly have a great impact on those creatures that thrive in the polar region [19]. At the same time, sea ice can regulate the energy exchange between the ocean and the atmosphere, which has a profound impact on local and remote climate change [20,21]. The reduction in sea ice will directly change the structure of the upper Arctic Ocean, and then lead to the surface temperature anomaly in the Arctic region through the ice-sea-air interaction [22]. It will also affect the temperature gradient between the polar and the mid-latitude and then weaken the westerly wind in the mid-latitude, which will eventually result in an increase in the probability of extreme weather events in the extra-polar regions through its amplification effect [23]. However, on the other hand, the recession of sea ice will increase the possibility of opening the Arctic shipping route, which is of great benefit to polar scientific investigation, ocean transportation and other polar marine activities.
The rapid reduction in Arctic sea ice call for forecasting needs, and thus piqued interest in predictions of Arctic sea ice at decadal and seasonal timescales [24][25][26]. Highly coupled atmosphere-ocean dynamical models have been applied to investigate the longterm downward trend. These models have degrees of success in simulating the decadal variations [27][28][29][30], and many scientists have made great efforts in seasonal forecasting of the Arctic summer sea ice variability [31] since polar maritime activities are basically conducted during summer time. However, there are unavoidable systematic errors and uncertainties in the dynamic model, which cannot truly simulate and predict the seasonal variation of Arctic sea ice. Even the most advanced numerical model can hardly accurately and objectively describe the complex physical processes in the variation of Arctic sea ice [32,33]. At present, the operational prediction system using statistical models at seasonal timescales cannot fully meet the application needs in practice [34][35][36][37][38]. It is evident that the prediction accuracy for seasonal sea ice variability is insufficient, and it is necessary to design better prediction models.
A physical-empirical (PE) statistical model was constructed on the basis of physical considerations, which is different from a simple statistical prediction model [39,40]. In the process of building the traditional linear PE statistical model, the multiple collinearity between different predictors often makes the regression coefficient unreliable, which leads to inaccurate prediction results. The partial least squares regression (PLSR) method can eliminate multiple collinearity among different factors and allow regression modeling in the case of there being a large number of predictive factors [41][42][43]. This method was first proposed by Herman Wold in the 1960s. At present, the PE PLSR model has been widely used in different fields, such as computational biology [44], chemistry [45][46][47] and neuroimaging [48]. In recent years, this technology has also been applied in geoscience and meteorological statistical prediction, and has achieved remarkable results in some extreme weather and climate events prediction [49][50][51][52]. Previous studies have shown that the PE PLSR model is superior to some simple regression models in effectively extracting information from the prediction factors; it also has better forecasting techniques and model stability.
In view of the urgency of predicting Arctic summer sea ice change and the superiority of the PE PLSR method, in this work, we developed a PE PLSR prediction model for the summer sea ice and obtained reasonably great accuracy. The remainder of this article is structured as follows: A brief description of the data and method is present in Section 2, including a detailed modeling process. In what follows we describe the selection of the key areas, where Arctic sea ice varies greatly in summer. In Section 4 we assess the predictors and find the PLSR modes, while a PE PLSR prediction model was built in Section 5. The article closes with a summary of the main results in Section 6.

Data
The main datasets employed in this study include the following: (1) Monthly SIC data from Hadley Centre Sea Ice and SST dataset for the period of 1979-2017 [53]. The numerical value represents the share of sea ice cover area in each grid area. If the total grid area can be represent by 10, then M (M = 0, 1, 2 . . . , 10) represents the share of the area covered by sea ice. M = 10 when it is covered by sea ice and 0 when there is no sea ice at all; (2) Monthly SST from the Extended Reconstructed SST Version 3b (ERSST.V3b) for the period of 1979-2017, collected from the National Oceanic and Atmospheric Administration (NOAA) [54]; (3) AO index calculated from the NOAA Climate Prediction Center (CPC) and defined as the first leading mode from the EOF analysis of monthly mean height anomalies at 1000 hPa; and (4) Monthly mean 100 hPa, 500 hPa, 850 hPa winds, corresponding geopotential height fields and sea level pressure for the period of 1979-2017 obtained from the National Centers for Environment Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis [55].

Model Introduction
(1) Modeling Principle and Procedure of the PE PLS Method Suppose we want to predict a time series Y with n-year observations (subscripts i represent specific years, ranging from 1 to n), which can be represented by a column vector. Firstly, we need to select a predictors field with physical significance based on certain physical analysis, which contains m lattice points and has the same time dimension with the predictand Y. Therefore, the predictive factor field can be expressed as a matrix (each grid point can be written as X ij ; subscripts i represent specific years and ranges from 1 to n and subscripts j represent members and ranges from 1 to m). That is, In order to facilitate the analysis, the forecast quantity and the prediction factor field are standardized with respect to their corresponding time period before the analysis. The standardized predictors field is represented by X 0 , and Y 0 represents the standardized predictand.
In the process of modeling, we need to put all the data of each grid point in the whole factor field into the model for training. Since there are too many predictors, the PE PLSR regression model focuses on finding the main components or potential variables of PLSR. That is to say, the PLSR method can find out the comprehensive variables Z that have the highest variance contribution to both the prediction factors and the forecast amount. Z is the component explaining both the predictors and the predictand. The components are orthogonal to each other. It is a linear combination of predictive factors, i.e., PLSR eigenvectors or PLSR principal components. The process of model building is as follows: In the first step, the forecast quantity Y 0 and each group of forecast factors X 0j (j = 1, 2, . . . , m) are regressed by the least square method, and the regression coefficient matrix is recorded as b 1j , thenŶ 0 = b 1j X 0j . It is a linear combination of the predictors X 0j . It is expected that it can interpret as much information as possible contained in the predictors field, and at the same time, it is required to have the best explanatory ability to the predictand Y 0 , which is max m ∑ j=1 r 2 Y 0 , X 0j Var X 0j . In the second step, regress the predictors field X 0j to b 1j , so that the first comprehensive (ω is the weight coefficient) that can interpret X 0 and Y 0 simultaneously can be obtained. In the third step, considering that Z 1 can only explain part of the information about the predictors field X 0 , regress Z 1 to X 0j and the residual matrix X 1 (X 1 = X 0 − r 1 Z 1 ) (r 1 represents the regression coefficient vector) is calculated. Corresponding residual estimates are also made for information not explained by Z 1 in forecast Y 0 . Return Y 0 to Z 1 and get Y 1 . Last but not least, the second synthetic variable Z 2 is extracted by repeating the above steps with the obtained Y 1 and X 1 until the Z component can explain the maximum covariance between the predictors and the predictand. At this time, the simulation effect of the model can hardly be improved by extracting more comprehensive variables, and the algorithm terminates. The number of comprehensive variables extracted was determined by a cross-validity test. In this model, the order of modal arrangement is in the order of variance of the interpreted predictand Y Finally, the prediction equation Y = β 1 Z 1 + β 2 Z 2 + . . . + β k Z k (k represents the number of comprehensive variables extracted, b 0 represents the predicted y-intercept) is obtained by multiple linear regression of the predictand Y to the extracted comprehensive variable Z. Because Z is a linear combination of X, the equation of predictand Y about predictors X can be written finally. The expression of the PE PLSR model is Y = α 1 X 1 + α 2 X 2 + . . . + α m X m .
Unlike the Empirical Orthogonal Function (EOF) method, the PLSR method can help us to find the PLSR component that can best explain the covariance between the predictors and predictand. We use the PLSR components that represent the predictors and predictand, rather than the predictors themselves, to build the regression models. That is to say, the principal mode obtained by the regression method can not only explain the variance contribution of the mode to the change of predictors, but also explain the change in prediction quantity closely related to it. While, the EOF method does not consider the variance of the predictand Y, but only the variance of the predictors X. For example, when analyzing the relationship between regional precipitation (Y) and the large-scale SST field (X), the first PLSR mode of SST can explain the variance in precipitation and SST to the greatest extent, but it cannot guarantee that the variance in SST is greater than that of the EOF first mode alone. Note that although the EOF first mode has the largest variance in explaining SST, it does not necessarily have a good relationship with regional precipitation. The PLSR mode is calculated according to the common variance of explanation (X, Y), and the mode obtained by the PLSR method contains more predictable information, which is the main reason why the PLSR analysis is used in this study [56].
(2) Advantages of the PLSR Model Among the statistical prediction modeling methods, the most commonly used is the ordinary linear regression method. In the process of model building, the results are often unreliable due to the limitations of linear regression. The unavoidable problem is that there are serious multiple linear correlations among the selected predictors, which results in some information not being parsed and lost, and cannot be integrated into the model. This multiple correlation will endanger the parameter estimation, expand the model error and seriously destroy the stability of the model. The PE PLSR regression model can extract the comprehensive variable with the strongest explanatory variance of the prediction factor by decomposing and screening the numerous data information in the system through the PLSR method. At the same time, it ensures that the extracted comprehensive variable has the greatest correlation with the forecast quantity. By finding the intermediate variables that have a good relationship with the predictors, the adverse effects of multiple correlations among variables on the model construction can be weakened. Besides, the PE PLSR model will contain all the original variables, which makes the prediction results of the model more reliable. Generally speaking, in general linear regression methods, the number of samples needed for modeling cannot be too small (not less than the number of predictive factors), while the PLSR model allows for modeling when the number of predictive samples is less than the number of predictive factors. Therefore, the PE PLSR method will be used to construct the PE PLSR model to predict the seasonal variation in Arctic sea ice in summer.

Introduction to the Research Areas: The Arctic Shipping Route
The Arctic shipping route is a maritime channel that crosses the Arctic Ocean and connects the Atlantic Ocean with the Pacific Ocean. There are two Arctic shipping routes. The Northern Sea Route refers to the channel that starts from Nordic Europe and goes eastward through the Arctic Ocean, the Barents Sea, the Kara Sea, the Laptev Sea, the East Siberian Sea and the Chukchi Sea to the Bering Strait and the Pacific Ocean. This waterway connects East Asia and northwest Europe. Compared with the traditional waterway, it has a shorter voyage and is of great significance to the shipping industry in East Asia, especially in China. The Northwest Passage begins with the Bering Strait and runs eastward along the North Alaska offshore (Beaufort Sea) through the Canadian Arctic Islands to the Davis Strait. This route is the shortest between the Atlantic and the Pacific Ocean. It saves one third of the distance and time compared with the traditional Panama Canal. Therefore, the commercial navigation of this route will bring convenience to trade between North America and Asia.
With the warming of the global climate, the Arctic sea ice is melting faster. The Arctic Route has been partially opened now, and the strategic position of the Arctic is becoming increasingly important. In addition to abundant natural resources, convenient Arctic waterways are becoming "new ties" and "highways" connecting the Atlantic and Pacific Oceans. With the increasing possibility of opening the Arctic waterway, Arctic marine activities, such as polar scientific investigation, polar fishing, oil and gas resources development, will become more frequent. The existence of sea ice is the main factor affecting the safety and efficiency of Arctic activities. Therefore, seasonal prediction of sea ice in summer is of great significance, which has become an urgent need for polar navigation and activities support. Figures 1 and 2 show the spatial distribution of the SIC climatology and standard deviation from May to October for the period 1979 to 2017, respectively. As can be seen from Figure 1, the Arctic sea ice is in a continuously melting period from June to September and begins to condense in October. It shows that the low-latitude sea ice begins to melt first, and gradually forms an ice-free circle in the marginal sea. In September, the sea ice extent was the smallest. At this time, the Bering Sea, Kara Sea, Baffin Bay and the coastal areas of Chukchi Sea, East Siberian Sea, Laptev Sea and Barents Sea have almost no sea ice coverage. The high-latitude sea ice in the Beaufort Sea, Chukchi Sea, East Siberia Sea, Laptev Sea and Barents Sea are also significantly reduced during early summer, and the areas covered by perennial ice is gradually shrinking, as it retreats to the middle of the Arctic Ocean (within 75 • N). In September, the sea ice in the whole coastal zone disappears and forms a wide open water body in the coastal sea area.    We describe the sea ice variability by calculating the standard deviation of the SIC. As shown in Figure 2, the spatial distribution of the SIC variability mainly shows a zonal distribution, and the high-value area mainly located near the ice boundary of the sea ice climatology. The summer sea ice anomaly centers basically appear in the Kara Sea, Laptev Sea, East Siberian Sea, Chukchi Sea and Beaufort Sea. In late summer, there is no ice on the North Pacific side, so there is no sea ice variability. On the North Atlantic side, a sea ice anomaly exists only in the vicinity of the Bay of Baffin, the Greenland Sea and other high-latitude areas. Looking at Figures 1 and 2, we can see that the sea ice has been in a melting stage from June to September. Low-latitude marginal sea ice around the coastal areas of Bering Sea, Kara Sea, Baffin Bay and Barents Sea melted completely in June and July. In the later period, there was no sea ice cover in these areas, and the sea ice variability was zero; but, this "ice-free" state has a great impact on shipping and climate change. Therefore, in this paper we chose the average sea ice from June to September (the whole melting period of sea ice) for in-depth discussion.

Characteristics of Arctic Sea Ice Variation and Key Area Selection
We calculated the average SIC standard deviation from June to September for the period 1979-2017, and its spatial distribution is shown in Figure 3a. According to the distribution of the large variability centers in different sea areas, the regional average sea ice indices are calculated, and the correlation coefficients between them can be obtained in Figure 3b. Combining sea ice in regions where the correlation coefficients pass a significant t-test, several key areas of sea ice anomalies (maximum sea ice variability) were determined. Therefore, considering the key areas of sea ice anomalies and the route of the Arctic shipping route, this study investigates the spatial and temporal distribution of sea ice in the following key sea areas of the Arctic shipping route, including (1)  of sea ice changes in these sea areas in summer and making timely prediction of their anomalous changes can provide much-needed ice information for Arctic activities.
significant t-test, several key areas of sea ice anomalies (maximum sea ice variability) were determined. Therefore, considering the key areas of sea ice anomalies and the route of the Arctic shipping route, this study investigates the spatial and temporal distribution of sea ice in the following key sea areas of the Arctic shipping route, including (1) the Barents Sea and the Kara Sea in the Northeast Channel; (2) the East Siberian Sea and the Laptev Sea in the Northeast Passage; and (3) the Beaufort Sea and the Canadian Arctic Archipelago in the Northwest Passage. These areas nearly cover the entire Arctic waterway. Studying the characteristics of sea ice changes in these sea areas in summer and making timely prediction of their anomalous changes can provide much-needed ice information for Arctic activities. We defined the time series of the regional average SIC variability in the Barents Sea-Kara Sea (BK), East Siberian Sea-Laptev Sea (EL) and the Beaufort Sea-Canadian  Figures 4-6), which is the number of standard deviations away from the average regional JJAS (the melt-season) sea ice concentration. Then we defined the detrended sea ice indices as the sea ice index (SII), since the focus here is on the interannual variations of the SIC rather than the long-term trend (blue carves in the Figures 4-6). We calculated the correlation of these indices in June, July, August and September, and the correlation coefficients have all passed a certain significant Student's t-test (Figures 4-6). Specifically, sea ice is in a melting state in these four months. It is reasonable to study and discuss the average SIC of the four months. , which is the number of standard deviations away from the average regional JJAS (the melt-season) sea ice concentration. Then we defined the detrended sea ice indices as the sea ice index (SII), since the focus here is on the interannual variations of the SIC rather than the long-term trend (blue carves in the Figures 4-6). We calculated the correlation of these indices in June, July, August and September, and the correlation coefficients have all passed a certain significant Student's t-test (Figures 4-6). Specifically, sea ice is in a melting state in these four months. It is reasonable to study and discuss the average SIC of the four months.

The Leading Predictable Modes
Identifying the predictors is the first step in the model specification. As far as the rapid reduction of Arctic sea ice is concerned, its mechanism is mainly related to global warming. One of the most important global warming progresses is related to the rapid warming of the tropical and extra-tropical oceans. The predictability originates from the interactions between the atmosphere and ocean. Therefore, the abnormal changes in the SST and the related atmospheric circulation anomalies may be the key factors leading to the rapid melting of Arctic sea ice.
In this study, the seasonal leading role of the SST for the SIC is used to build statistical seasonal prediction models, and the PLSR method is used to find the spatial distribution of the SST anomalies before the SIC anomalies occur. First, the spring (March-April-May) SST anomaly field is used as predictors to establish a prediction equation, when the relationships between the predictors and the predictand are generally the greatest. Based on the measured SII, which is standardized according to its corresponding time period and the previous data of the SST field, the key area of SST that has the best correlation with sea ice anomaly is selected through multiple sensitivity tests to determine the sea area range of the prediction factor field Xij. Here, subscript j denotes the number of grid points in the selected critical area of SST after eliminating the missing data. Forecast Yi is the subsequent SII, which is a predictable one dimensional

The Leading Predictable Modes
Identifying the predictors is the first step in the model specification. As far as the rapid reduction of Arctic sea ice is concerned, its mechanism is mainly related to global warming. One of the most important global warming progresses is related to the rapid warming of the tropical and extra-tropical oceans. The predictability originates from the interactions between the atmosphere and ocean. Therefore, the abnormal changes in the SST and the related atmospheric circulation anomalies may be the key factors leading to the rapid melting of Arctic sea ice.
In this study, the seasonal leading role of the SST for the SIC is used to build statistical seasonal prediction models, and the PLSR method is used to find the spatial distribution of the SST anomalies before the SIC anomalies occur. First, the spring (March-April-May) SST anomaly field is used as predictors to establish a prediction equation, when the relationships between the predictors and the predictand are generally the greatest. Based on the measured SII, which is standardized according to its corresponding time period and the previous data of the SST field, the key area of SST that has the best correlation with sea ice anomaly is selected through multiple sensitivity tests to determine the sea area range of the prediction factor field X ij . Here, subscript j denotes the number of grid points in the selected critical area of SST after eliminating the missing data. Forecast Y i is the subsequent SII, which is a predictable one dimensional time series. The comprehensive variable Z can be understood as the main distribution pattern of SST, which is closely related to sea ice change revealed by the PLSR method; also, the SST field and sea ice have the best explanatory ability in the early stage.
The PLSR method is used to find the best SST mode with the best correlation between the SST in the early spring and the BK SII. As can be seen from Figure 7, the first two modes of PLSR have statistical significance. The first mode can explain 15.48% of the SST change and 53.84% of the sea ice change. The second mode can explain 20.79% of the SST change and 19.11% of the sea ice change. The first mode has an interdecadal variation similar to the AMO [57], and the corresponding time series shows interdecadal variation. The second mode is related to an AO-like or NAO-like SST pattern [58], and its time series shows interannual variation. This reflects the consideration of multi-scale interaction, which makes the prediction results more accurate. These two SST modes can provide a predictable source of sea ice in the BK Sea from June to September.
Atmosphere 2021, 12, x FOR PEER REVIEW 10 of 21 modes. Figure 8 shows the development of the corresponding SST field from early winter to autumn. It can be seen that the second mode of PLSR has an obvious AO distribution pattern in the early spring, and this SST distribution pattern can last until the subsequent summer, which exerts a direct effect on summer sea ice change. It shows that the second mode has a certain degree of persistence, and this stability can provide strong support for prediction.  In order to reveal the physical significance of the above two PLSR SST modes, we analyzed the evolution characteristics of the SST field corresponding to the two PLSR SST modes. Figure 8 shows the development of the corresponding SST field from early winter to autumn. It can be seen that the second mode of PLSR has an obvious AO distribution pattern in the early spring, and this SST distribution pattern can last until the subsequent summer, which exerts a direct effect on summer sea ice change. It shows that the second mode has a certain degree of persistence, and this stability can provide strong support for prediction. We also calculated the lead-lag correlation between the time series corresponding to the two modes and AO index (Figure 9a). The two modes are first presented in Figure  7, in which case the predictand is spring SSTs and summer SIC, which are constant. While, the AO index is being averaged for each season, which can be regard as variable. It can be found that the correlation between the second mode time series and AO index passed the 95% significance t-test in the early spring, which shows that the second mode is obviously related to the change in AO. We linearly regressed the time series with prior spring U and V winds at the 850 hPa levels and SLP; the responses of the atmospheric circulation suggest an obvious AO-positive phase (Figure 9b). Low-pressure anomalies occur in the polar region while high-pressure anomalies are located in the middle latitudes. It indicates that the sea ice changes in the BK Sea may be closely related to the  (Figure 7b,d) and the sea surface temperature anomaly ( • C) in (a,e) early winter, (b,f) spring, (c,g) summer and (d,h) autumn, for the period 1979-1999. The color area was the area that passed the significance t-test at the 99% (dark) and 95% (light) confidence levels, respectively (Barents-Kara SII).
We also calculated the lead-lag correlation between the time series corresponding to the two modes and AO index (Figure 9a). The two modes are first presented in Figure 7, in which case the predictand is spring SSTs and summer SIC, which are constant. While, the AO index is being averaged for each season, which can be regard as variable. It can be found that the correlation between the second mode time series and AO index passed the 95% significance t-test in the early spring, which shows that the second mode is obviously related to the change in AO. We linearly regressed the time series with prior spring U and V winds at the 850 hPa levels and SLP; the responses of the atmospheric circulation suggest an obvious AO-positive phase (Figure 9b). Low-pressure anomalies occur in the polar region while high-pressure anomalies are located in the middle latitudes. It indicates that the sea ice changes in the BK Sea may be closely related to the changes in AO in the early spring, and the anomalous signals of AO may lead to the anomalous changes in sea ice in the BK Sea from June to September in summer.
Atmosphere 2021, 12, x FOR PEER REVIEW 12 of 21 changes in AO in the early spring, and the anomalous signals of AO may lead to the anomalous changes in sea ice in the BK Sea from June to September in summer. Figure 9. (a) Lead-lag correlation between the time series of the first two modes of the PLSR and AO index. In the diagram, "−1" represents the previous year, "0" represents this year, and "1" represents the next year. The horizontal dotted line represents the critical line with a statistical confidence level at the 95% level, based on a significant t-test, and the intersection point of the curve and vertical line represents this summer (JJA(0)); (b) regression of the PLSR second mode to SLP (hPa) and 850 hPa U and V wind field (m/s) in the early spring, only drawing a wind field passing the 95% significance t-test.
Similarly, we use the PLSR method to find the best SST modes for the EL and BC sea ice indices. The results are shown in Figures 10 and 11. As can be seen from Figure  10, sea ice in the EL Sea is associated with the PDO-like SST modes [59]. Specifically, the Mode 1 matches a positive PDO signal for the northern Pacific only. Mode 2 matches a positive PDO signal for the tropical Pacific and Indian Ocean but not the northern Pacific-especially not east of Japan; that is a bit more El Niño-like. Mode 3 looks the most PDO-like overall (a negative PDO pattern/La Niña). The first three modes of PLSR have statistical significance, totally accounting for 47.43% of the SST changes and 87.2% of the SII changes. As for the BC SII, the first two SST modes have statistical significance, explaining the variance contribution of 43.08% SST change and 49.26% SSI change ( Figure  11). As can be seen from the chart, sea ice in the BC area is associated with SST modes similar to PDO or ENSO. The first mode shows the cold SST in the tropical region and the equatorial Pacific region. In addition to the horseshoe-shaped SST distribution in the Northwest Pacific, the second mode also has significant, warm SST signals in the tropics, which is more consistent with PDO or ENSO-especially an El Niño pattern [60]. These SST modes can provide predictive sources for prediction. The corresponding evolution patterns of the SST modes are shown in Figures 12 and 13.
These SST modes searched by the PLSR method indicate the precursory signals of the sea ice anomalies in early spring ahead of summer. They contain the information of the interannual to interdecadal oscillation signals (multi-scale interaction), such as AO, ENSO, PDO and AMO, which have a certain persistence. They can be used as the prediction factor field of the prediction model. Therefore, in the next study, we will use these SST modes to further establish the PE PLSR model to predict the SII. Figure 9. (a) Lead-lag correlation between the time series of the first two modes of the PLSR and AO index. In the diagram, "−1" represents the previous year, "0" represents this year, and "1" represents the next year. The horizontal dotted line represents the critical line with a statistical confidence level at the 95% level, based on a significant t-test, and the intersection point of the curve and vertical line represents this summer (JJA(0)); (b) regression of the PLSR second mode to SLP (hPa) and 850 hPa U and V wind field (m/s) in the early spring, only drawing a wind field passing the 95% significance t-test.
Similarly, we use the PLSR method to find the best SST modes for the EL and BC sea ice indices. The results are shown in Figures 10 and 11. As can be seen from Figure 10, sea ice in the EL Sea is associated with the PDO-like SST modes [59]. Specifically, the Mode 1 matches a positive PDO signal for the northern Pacific only. Mode 2 matches a positive PDO signal for the tropical Pacific and Indian Ocean but not the northern Pacific-especially not east of Japan; that is a bit more El Niño-like. Mode 3 looks the most PDO-like overall (a negative PDO pattern/La Niña). The first three modes of PLSR have statistical significance, totally accounting for 47.43% of the SST changes and 87.2% of the SII changes. As for the BC SII, the first two SST modes have statistical significance, explaining the variance contribution of 43.08% SST change and 49.26% SSI change ( Figure 11). As can be seen from the chart, sea ice in the BC area is associated with SST modes similar to PDO or ENSO. The first mode shows the cold SST in the tropical region and the equatorial Pacific region. In addition to the horseshoe-shaped SST distribution in the Northwest Pacific, the second mode also has significant, warm SST signals in the tropics, which is more consistent with PDO or ENSO-especially an El Niño pattern [60]. These SST modes can provide predictive sources for prediction. The corresponding evolution patterns of the SST modes are shown in Figures 12 and 13.  Atmosphere 2021, 12, x FOR PEER REVIEW 13 of 21        (Figure 10b,d,f) and the sea surface temperature anomaly ( • C) in (a,e,i) early winter, (b,f,j) spring, (c,g,k) summer and (d,h,l) autumn, for the period 1979-1999. The color area was the area that passed the significance t-test at the 99% (dark) and 95% (light) confidence levels, respectively (East Siberian Sea and the Laptev SII).   These SST modes searched by the PLSR method indicate the precursory signals of the sea ice anomalies in early spring ahead of summer. They contain the information of the interannual to interdecadal oscillation signals (multi-scale interaction), such as AO, ENSO, PDO and AMO, which have a certain persistence. They can be used as the prediction factor field of the prediction model. Therefore, in the next study, we will use these SST modes to further establish the PE PLSR model to predict the SII.

Modeling Process
In this study, the PE PLSR prediction model was established by reconstructing the SII, and independent prediction experiments were carried out. Modeling divides the whole data set into two parts. Firstly, using the data of the first 21 years (1979-1999) for regression training, this time is long enough to be used as the background field of reliable modeling and get the model of the training period. Then, the next 18 years (2000-2017) were taken as the actual forecast period, and the predicted value was compared with the true value to verify the prediction ability of the model.
The predictive equation we established is as follows (SII represents the time series of Arctic SII): SII = SST × Beta + SII residual SII fit = SST × Beta, SII = SII fit + SII residual Here, SST represents the SST field and Beta is the regression coefficient matrix.
The specific process of model building and forecasting ability testing is as follows. During the model training and modeling period, the mean SST in spring 1979-1999 and the measured SII in 1979-1999 were used to fit the model in accordance with the PLSR modeling steps. Then, a stable regression equation of SII with respect to the SST field was established. The SII time series simulated during the training period of the corresponding model was obtained by introducing the data of the annual SST field.
In the actual forecast period of the model, it is assumed that we will forecast the SII in 2000. Firstly, the regression coefficient matrix Beta: SII fit (1979-1999) = SST  × Beta and SII residual (1999) = SII observe (1999) − SII fit (1999) were calculated according to the above prediction equation. Then, the new observed data of the 1996 SST field were input and SII fit (2000) = SST (1999) × Beta, SII (2000) = SII fit (2000) + SII residual (1999) were obtained.
In the process of modeling, through many sensitivity tests, we find that the new Beta and residuals can be obtained by inputting the new SST data and actual flux index into the equation every year, and then putting the new residuals into the model operation can improve the prediction skills of the model. Because of the uncertainty of the model system, the weakening factor is introduced to smooth the system error.

Model Prediction Effect and Performance Evaluation
By calculating the correlation coefficient between the actual SII series and the simulation results, we can judge the fitting ability of the model, and calculate the correlation between the measured SII and the prediction results to obtain the actual prediction skills of the model. As shown in Figure 4, the blue solid line represents the actual SII, the yellow solid line is the SII simulated by the model training in 1979-1999, and the red solid line is the SII actually predicted by the model in 2000-2017. Corr1 is the correlation coefficient between the PE PLSR simulation and the actual SII in 1979-1999, and Corr2 is the correlation coefficient between the PE PLSR prediction and the actual SII in 2000-2017. Based on the first two SST modes, we construct the PE PLSR model to predict the BK SII. As shown in Figure 14, the correlation between the simulation period and the forecast period reaches 0.8541 and 0.8505, and the correlation coefficients pass the significance t-test with a confidence level of 99%, which shows that the established model has a high prediction level in both the simulation and prediction periods.
here 2021, 12, x FOR PEER REVIEW 16 of 21 at or above the 99% confidence level. The prediciton skills are relatively high. What is more, we constructed the PE PLSR model to forecast the SII in the BC area using the first two SST modes. The correlation coefficient between the simulated SII and measured SII is 0.7325 and 0.5960 (Figure 16), which all passed the 99% significance t-test. The model has a good predictive performance for sea ice anomalies in the area of the BC during the target months.   Similarly, based on the first three SST modes, we construct the PE PLSR model to predict the SII of the EL Sea. As shown in Figure 15, the correlation between the measured SII and the model simulated index is 0.9331, and the correlation between the predicted index and the predicted index is 0.6321. All of them are statistically significant at or above the 99% confidence level. The prediciton skills are relatively high. What is more, we constructed the PE PLSR model to forecast the SII in the BC area using the first two SST modes. The correlation coefficient between the simulated SII and measured SII is 0.7325 and 0.5960 (Figure 16), which all passed the 99% significance t-test. The model has a good predictive performance for sea ice anomalies in the area of the BC during the target months. at or above the 99% confidence level. The prediciton skills are relatively high. What is more, we constructed the PE PLSR model to forecast the SII in the BC area using the first two SST modes. The correlation coefficient between the simulated SII and measured SII is 0.7325 and 0.5960 (Figure 16), which all passed the 99% significance t-test. The model has a good predictive performance for sea ice anomalies in the area of the BC during the target months.

Discussion and Conclusions
The summer Arctic SIC has decreased dramatically during the past decade, which may bring about unpredictable climate impact and may accelerate the maritime transport in the Arctic shipping route as well. Accurate prediction of the Arctic summer sea ice draws great interests but remains an ongoing challenge for scientists.
In this study, we carried out the seasonal prediction of summer SIC by statistical approaches using the PLSR model to predict the sea ice variation in three major parts of the Arctic shipping route. This approach combines statistical analysis with physical thinking. Wang et al. [61] showed there is good agreement between the sea ice dataset used here and the more widely used NSIDC record for the satellite period. The data we used are relatively reliable. Although the resolution of the data is not relatively high, the difference may be negligible if we use seasonal average sea ice data over a large spatial range. The previous SST in spring was chosen as predictors to establish the statistical models, and the models were established by the SST modes from the PLSR method. The high absolute values of the correlation coefficient suggest that these models have a fabulous ability in reproducing the sea ice change, and we confirmed that the models provided moderate seasonal prediction results in the lead time of one season. Therefore, the model is recommended for the prediction of Arctic summer SIC.
Although this model can present good simulation results in all three regions, the Barents-Kara region has a stronger relationship in the 2000-2017 period than the Laptev-East Siberian or Beaufort-Canadian Archipelago regions. In other words, the Atlantic side has more predictability than the Pacific side with this method. It is apparent that, while the ice cover in the Arctic is retreating rapidly, the character of the changing ice is very different in the Atlantic and Pacific sectors [62][63][64][65]. A number of studies showed that the Barents Sea is one of the Arctic regions with the largest sea ice variability at different time scales [66][67][68] and the air-sea temperature differences in the Barents Sea are extremely large [69,70]. We can expect that sea ice change in the Barents Sea will greatly affect the heat release, thus affecting the local and possibly large-scale climatic conditions. This will provide a potential for climate prediction [71].
The Arctic will be a window for future climate change, and the Arctic region is very sensitive to climate change. If the Arctic continues to warm, we will encounter extreme

Discussion and Conclusions
The summer Arctic SIC has decreased dramatically during the past decade, which may bring about unpredictable climate impact and may accelerate the maritime transport in the Arctic shipping route as well. Accurate prediction of the Arctic summer sea ice draws great interests but remains an ongoing challenge for scientists.
In this study, we carried out the seasonal prediction of summer SIC by statistical approaches using the PLSR model to predict the sea ice variation in three major parts of the Arctic shipping route. This approach combines statistical analysis with physical thinking. Wang et al. [61] showed there is good agreement between the sea ice dataset used here and the more widely used NSIDC record for the satellite period. The data we used are relatively reliable. Although the resolution of the data is not relatively high, the difference may be negligible if we use seasonal average sea ice data over a large spatial range. The previous SST in spring was chosen as predictors to establish the statistical models, and the models were established by the SST modes from the PLSR method. The high absolute values of the correlation coefficient suggest that these models have a fabulous ability in reproducing the sea ice change, and we confirmed that the models provided moderate seasonal prediction results in the lead time of one season. Therefore, the model is recommended for the prediction of Arctic summer SIC.
Although this model can present good simulation results in all three regions, the Barents-Kara region has a stronger relationship in the 2000-2017 period than the Laptev-East Siberian or Beaufort-Canadian Archipelago regions. In other words, the Atlantic side has more predictability than the Pacific side with this method. It is apparent that, while the ice cover in the Arctic is retreating rapidly, the character of the changing ice is very different in the Atlantic and Pacific sectors [62][63][64][65]. A number of studies showed that the Barents Sea is one of the Arctic regions with the largest sea ice variability at different time scales [66][67][68] and the air-sea temperature differences in the Barents Sea are extremely large [69,70]. We can expect that sea ice change in the Barents Sea will greatly affect the heat release, thus affecting the local and possibly large-scale climatic conditions. This will provide a potential for climate prediction [71].
The Arctic will be a window for future climate change, and the Arctic region is very sensitive to climate change. If the Arctic continues to warm, we will encounter extreme weather more frequently [72,73]. Arctic climate change has profound impacts on the cryosphere, notably via shrinking the sea-ice cover [74]. It is crucial to make reasonable predictions of Arctic summer sea ice, and the results from this paper can help provide more timely and effective forecasting services for Arctic scientific research and marine activities.

Institutional Review Board Statement:
Our study did not involve humans.

Informed Consent Statement:
Our study did not involve humans.
Data Availability Statement: Our data are publicly available, and the specific website can refer to references.

Conflicts of Interest:
The authors declare no conflict of interest.