Remote Sensing Statistical Modeling of Sea Ice Concentration Using Satellite Imagery and Climate Reanalysis Data in the Barents and Kara Seas, 1979–2012

Extensive sea ice over Arctic regions is largely involved in heat, moisture, and momentum exchanges between the atmosphere and ocean. Some previous studies have been conducted to develop statistical models for the status of Arctic sea ice and showed considerable possibilities to explain the impacts of climate changes on the sea ice extent. However, the statistical models require improvements to achieve better predictions by incorporating techniques that can deal with temporal variation of the relationships between sea ice concentration and climate factors. In this paper, we describe the statistical approaches by ordinary least squares (OLS) regression and a time-series method for modeling sea ice concentration using satellite imagery and climate reanalysis data for the Barents and Kara Seas during 1979–2012. The OLS regression model could summarize the overall climatological characteristics in the relationships between sea ice concentration and climate variables. We also introduced autoregressive integrated moving average (ARIMA) models because the sea ice concentration is such a long-range dataset that the relationships may not be explained by a single equation of the OLS regression. Temporally varying 5521 relationships between sea ice concentration and the climate factors such as skin temperature, sea surface temperature, total column liquid water, total column water vapor, instantaneous moisture flux, and low cloud cover were modeled by the ARIMA method, which considerably improved the prediction accuracies. Our method may also be worth consideration when forecasting future sea ice concentration by using the climate data provided by general circulation models (GCM).


Introduction
Sea ice is an important component of the climate system.Sea ice cover can change the surface albedo, which in turn acts to reinforce the initial alteration in ice area [1][2][3].Extensive sea ice over Arctic regions is largely involved in heat, moisture, and momentum exchanges between the atmosphere and ocean [4].This is because the sea ice surface reflects significantly more of the incident solar radiation than open water and because melted water has a significant influence on oceanic circulation [5,6].Therefore, changes in the extent of sea ice have great potential to influence variations in regional and global climatic systems [7,8].
Although a long-range dataset over a large-scale area is necessary to achieve reliable spatiotemporal analysis, it is difficult to scrutinize sea ice changes due to the lack of in-situ observations [9].However, Arctic sea ice concentration, extent, and area have been continuously monitored for approximately 34 years.Monitoring has been ongoing since 1979 with the help of satellite-based multichannel passive microwave imaging systems [10] such as Scanning Multichannel Microwave Radiometer (SMMR), Special Sensor Microwave/Imager (SSM/I), and Advanced Microwave Scanning Radiometer (AMSR).Sea ice concentration is defined as the fraction of ice-covered areas at a given point in the ocean.Sea ice extent denotes the sum of ice-covered areas with concentrations of at least 15%, while sea ice area is the product of the ice concentration and each pixel area within the ice extent [11,12].Recent satellite remote sensing studies show that there has been a significant decline in Arctic sea ice extent [13,14] with the possibility that global warming is occurring more rapidly than before.Therefore, reliable outlooks for sea ice conditions are crucial in understanding the future Arctic environment and global climate change [15].
In the Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report, six climate models indicated that the Arctic might have sea ice-free summers in the 2030s [16,17].However, significant differences were found among the results predicted by several climate models evaluating changes in Arctic sea ice cover [16,17].Consequently, the mechanism behind the recent rapid decrease in sea ice extent, which is not yet fully understood, may result in some uncertainties in the process-based Arctic sea ice module of climate models [18,19].Some other studies on sea ice changes have employed statistical models based on empirical relationships between sea ice conditions and several explanatory variables e.g., [18,[20][21][22][23][24][25][26].Variables in the statistical models include prior information on sea ice, as well as oceanic and atmospheric conditions that can influence sea ice changes.Temperature is reasonably expected to be a major predictor for the current loss of sea ice caused by the atmospheric warming trend over the Arctic area [27,28].However, statistical predictions require additional climatic variables for a more stable explanation for Arctic sea ice changes [18,29,30].Such empirical knowledge can help identify the physical processes underlying Arctic sea ice changes [16,31] and can contribute to making reasonable outlooks even without the need for explicit physical mechanisms and realistic initial conditions [29].
Some previous studies have been conducted to develop statistical models for the status of Arctic sea ice at seasonal to annual scales and showed considerable possibilities to explain the impacts of climate changes on Arctic sea ice extent.Drobot and Maslanik [20] exploited a statistical model with four regressors (winter multiyear ice concentration, spring total ice concentration, North Atlantic Oscillation index, and East Atlantic index) in order to explain summer ice conditions in the Beaufort Sea.Drobot [21] extended the work of Drobot and Maslanik [20] by adding more explanatory variables like heating degree-days to the regression model for the Beaufort-Chukchi Sea.These works aimed to predict the Barnett severity index (BSI) that shows the expansion of open water using surface and atmospheric variables.Drobot et al. [22] developed a multiple regression model to predict sea ice extent using satellite data such as ice concentration, surface skin temperature, surface albedo, and downward longwave radiative flux as explanatory variables.Drobot [18] also provided several regression equations for the relationship between minimum sea ice extent and satellite-observed surface variables including temperature, albedo, and downward longwave radiation.Lindsay et al. [23] presented a similar regression analysis that predicted sea ice extent by using surface variables for the Arctic Ocean.Årthun et al. [24] showed the correlations between Barents Sea ice area and Atlantic heats.Pavlova et al. [25] analyzed the impacts of winds and sea surface temperatures on the Barents Sea ice extent using correlation coefficients.In addition, Tivy et al. [26] analyzed July sea ice concentration in the Hudson Bay with the help of canonical correlation analysis using sea surface temperature, geopotential height, and surface air temperature.
The characteristics of previous statistical studies on the satellite-observed sea ice change include the following.First, most of their techniques for modeling sea ice changes were somewhat limited by focusing solely on the prediction of sea ice extent [26] using variables such as sea ice concentration, despite the fact that sea ice extent is indeed the value directly determined by sea ice concentration, according to its definition.Rather, sea ice concentration itself has rarely been modeled by statistical methods using satellite imagery except for Tivy et al. [26] whose analyses were conducted simply for July.Second, the statistical models require improvements to achieve better accuracies by incorporating techniques that can deal with temporality or long-term variation of the relationships between sea ice concentration and climate factors.Indeed, 34-year satellite imagery includes 408 time series on the monthly basis.It is quite a long-range dataset, so the relationships between sea ice concentration and climate factors may not be sufficiently explained by a single equation of the ordinary least squares (OLS) method.Instead, time-series statistical approaches such as vector autoregression (VAR) and autoregressive integrated moving average (ARIMA), whose predictabilities have been proved in many other fields, can be an alternative to modeling sea ice changes in terms of temporally varying relationships.
In this paper, we described the statistical modeling of the Arctic sea ice changes in relation to various climatic factors using recent 34 year satellite imagery and climate reanalysis data.A target variable to be analyzed is the sea ice concentration retrieved by the National Aeronautics and Space Administration (NASA) Team algorithm [11], and explanatory variables include skin temperature, sea surface temperature, total column liquid water, total column water vapor, instantaneous moisture flux, and low cloud cover that were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim datasets [32].The six explanatory variables were selected by taking account of the correlation coefficients for sea ice concentration and multicollinearity among variables as well.The OLS regression models were useful in summarizing climatological patterns that can be found in the relationships between sea ice concentration and climate factors, and the ARIMA models had advantages in the improvements of prediction accuracy.Our study area is the Barents and Kara Seas, which have experienced considerable sea ice changes for the period.

Satellite and Climate Datasets
The Barents and Kara Seas area was selected to analyze sea ice changes on a regional scale.Sea ice concentration in this area exhibits very high variability because it is covered by relatively thin seasonal ice [33] impacted by highly variable Atlantic water inflow [34] and atmospheric forcing primarily driven by North Atlantic Oscillation [35,36].The data for sea ice concentration produced by the NASA Team algorithm, which has been used in many sea ice studies, was obtained from the website of National Snow and Ice Data Center (NSIDC).An enhancement of the algorithm by the NASA Team on sea ice concentration overcomes the problem of a low ice concentration bias associated with surface snow effects.It is calculated based on the brightness temperature difference between 19 and 37 GHz channels obtained from SMMR and SSM/I [11].We used monthly dataset with a spatial resolution of 0.25 × 0.25 degrees in the polar stereographic projection centered on the North Pole.The Barents and Kara Seas area was delineated using a region mask provided by NSIDC (Figure 1), which consists of 3912 pixels per scene.
As climate factors affecting the changes of sea ice concentration, we used monthly means of ERA-Interim products, which is the latest global climate reanalysis provided by ECMWF.We first investigated possible explanatory variables in relation to temperature, water, radiation, wind, pressure, heat energy, and cloud conditions.Because monthly radiation variables (e.g., surface net solar radiation and surface net thermal radiation) were only provided in forecasted values not in reanalyzed ones, they were not included in our analyses.Wind variables (e.g., wind speed and U/V component), pressure variables (e.g., mean sea level pressure and surface pressure), and heat energy variables (e.g., sensible heat and latent heat) which showed relatively low correlations (  < 0.5) to sea ice concentration during 1979-2012 were also excluded.Hence, we divided the remaining explanatory variables into three groups: (1) temperature-related variables including 2-m temperature, 2-m dewpoint temperature, skin temperature, and sea surface temperature; (2) water-related variables including vertical integral of water vapor, total column liquid water, total column water vapor, total column water, and instantaneous moisture flux; and (3) cloud-related variables such as low, medium, and high cloud cover.We finally selected skin temperature, sea surface temperature, total column liquid water, total column water vapor, instantaneous moisture flux, and low cloud cover as appropriate explanatory variables by considering the correlation coefficients to sea ice concentration and multicollinearity between variables within the group.The skin temperature (K) is defined as the temperature of the top skin of the sea (approximately ≤0.01 mm), and the sea surface temperature (K) indicates the temperature of the sea at approximately 20-30 cm depth.The total column liquid water (kg/m 2 ) denotes vertical integral of water in the liquid phase from the ground to the nominal top of the atmosphere expressing the total amount of cloud liquid water, and the total column water vapor (kg/m 2 ) is for the total amount of water vapor in the atmosphere.The instantaneous moisture flux (kg/m 2 /s) indicates the amount of evaporation for a unit area per second.Low cloud cover (0-1) is the fraction of clouds in the low layer, where the ratio of pressure to surface pressure is >0.8.The ERA-Interim products have been advanced by a data assimilation scheme based on 12 h 4D-Var, which possesses improved model physics, fast radiative transfer model, and better formulation of background error constraint [32].The data is originally produced with a spatial resolution of 0.75 × 0.75 degrees and can be also provided on the grids of 0.25 × 0.25 and 0.5 × 0.5 degrees through a spatial interpolation.We used the data on the 0.25 × 0.25-degrees grid and its geographic projection with latitude and longitude was converted into the polar stereographic projection in accordance with the NSIDC sea ice products.A set of map projection parameters such as latitude of standard parallel, longitude of central meridian, false easting, and false northing was specified for the conversion [37].
Finally, 408 monthly layers for the 34 years (1979-2012) were aggregated from the NSIDC sea ice products and the ECMWF reanalysis datasets in order to examine the relationships between sea ice concentration and climate factors such as skin temperature, sea surface temperature, total column liquid water, total column water vapor, instantaneous moisture flux, and low cloud cover.Explanatory variables were normalized in the form of z-score calculated as (  −  )/  , where   is an individual value,  is the mean, and   is the standard deviation for the entire pixels during 1979-2012.This is because normalized values allow for comparisons among the regression coefficients of explanatory variables even though they originally had different units.

Statistical Models
A regular regression model is based on the ordinary least squares method, which can be expressed in the Formula: where  is a response variable and  1 to   are explanatory variables.In remote sensing studies,  is generally a remotely sensed variable and  1 to   are environmental variables of interest. 0 represents the intercept, and  1 to   are the slopes of the relationship between  and  1 to   .The error term  may include all other factors influencing the response variable  except for the regressors  1 to   .In this study,  is the sea ice concentration and  1 to   are skin temperature, sea surface temperature, total column water, total column liquid water, instantaneous moisture flux, and low cloud cover, respectively.The OLS regression model is often assumed to apply universally over the whole area and the whole period, which implies spatial and temporal stationarities in the relationship between the response and explanatory variables.Since some environmental phenomena can be explained in terms of temporality or seasonality, the ARIMA model may be useful in explaining complex long-range dataset.In contrast to the OLS approach, the ARIMA model assumes temporal non-stationarity based on the possible differences in the relationships between response and explanatory variables over time [38].The model is briefly expressed as ARIMA , ,  where parameters p, d, and q are non-negative integers that refer to the order of autoregressive (), integrated (), and moving average () parts of the model, respectively.Also, seasonal ARIMA model is denoted as ARIMA(, , )(, , ) with additional parameters such as seasonal autoregressive order (), seasonal differencing order (), and seasonal moving average order (  ).These parameters can determine the predictability of an ARIMA model and can be optimized by minimizing the criteria such as AIC (Akaike information criterion) and BIC (Bayesian information criterion).A typical seasonal ARIMA model has the following form [39]: where  is the backshift operator;   is the time series;  is the mean term; () is the non-seasonal autoregressive operator;   (  ) is the seasonal autoregressive operator; () is the non-seasonal moving average operator;   (  ) is the seasonal moving average operator;   is the random error.In order to employ the ARIMA method in the time-series modeling with multiple explanatory variables, we can add regressors to an ARIMA equation, which literally adds the regressors to the right-hand-side of the equation.

OLS Regression Model Results
Using the monthly data for sea ice concentration and climate variables, we derived OLS regression equations as follows:  Skin temperature at the top of sea surface had a considerable seasonal variation when compared to the sea surface temperature at 20-30 cm depth (Figure 2).The reason of such variation is caused by the fact that skin temperature is controlled by solar radiation in summer and by atmospheric condition in winter whereas sea surface temperature is influenced by the North Atlantic current all the year round.In addition, seasonal changes of   and   were quite different (Figure 3) even though both atmospheric and oceanic forcing of sea ice cover could contribute to thermodynamic melting.In warmer months, the skin temperature showed a sharp increase, and such rapid changes might act as a noise leading to the reversion of the regression coefficients:   abruptly changed to positive in July and August.During the winter season (from October to April),   and   showed relatively similar values close to zero, which might be due to the weaker thermodynamic effects of skin temperature and sea surface temperature below the freezing point.Sea ice-cloud interactions and sea ice-water vapor interactions are very complicated since they are associated with many aspects of the energy balance and dynamics in the atmosphere.However, some previous studies, e.g., [40][41][42][43][44][45] put forward that cloud covers are closely related to the changes in sea ice concentration even if the physical mechanism of effect of cloud warming or cooling on the surface or top of the atmosphere may not be well explained.Although the normalized values of total column liquid water (in clouds) and total column water vapor were very similar throughout the months (Figure 4),   and   showed somewhat different seasonal patterns (Figure 5).In particular,   increased in spring (March, April, and May) but dropped to negative values in June when the melting starts.It can be assumed that the warming or cooling effects of clouds were differently applied according to the month or season and led to such a unique pattern [44,45].The R 2 values for explanatory power were very high in colder months (mostly >0.95 from October to May), but were relatively low in warmer months (approximately 0.8 from June to September) (Table 2) partly because of the low correlation coefficients of explanatory variables in the months (Table 3).Such low correlations may be related to the atmospheric and oceanic effects on the time lag of thermodynamic melting in warmer months [26,46].Then, we prepared satellite-observed monthly sea ice concentration in the Barents and Kara Seas in 2012 (Figure 6) for the validation of the OLS regression models calibrated during 1979-2011.We derived the values of the monthly concentration of sea ice for each pixel of the Barents and Kara Seas in 2012 (Figure 7) and compared them with the NSIDC satellite data to examine the spatial and temporal distributions of prediction errors (Figure 8).
The pixels with greater errors overestimated or underestimated were found according to months.This implies a time-series approach based on temporally varying relationships may help diminish the prediction errors in the study area.The results of validation were presented in terms of mean bias and monthly root mean square error (RMSE) in Table 4.In addition to year 2012, the validations were conducted for the years 2002, 1992, and 1982.The R 2 of the OLS regression models were shown in Table 5, and the mean bias and RMSE were presented in Table 6, which were not significantly different from those of year 2012.Also, the seasonal patterns of the six regression coefficients were almost same as those in 2012.Given that climatic characteristics might vary according to individual years, the six explanatory variables can be considered representative to predict sea ice concentration under the changes of atmospheric and oceanic conditions.

Time-Series Model Results
Unlike the OLS regression, the ARIMA model can apply for each pixel containing a multivariate time series of the sea ice concentration and its six explanatory variables (Figure 9).We employed the R "forecast" library for optimizing the model parameters of ARIMA(, , )(, , ) with the six regressors.The calibration period was set as between January 1979 and the month right before the predicted month: for example, a time series from January 1979 to April 2012 was used in predicting May 2012. Figure 10 shows the spatial distribution of estimated values of sea ice concentration by the ARIMA models, and Figure 11 is the map of validation accuracies.The RMSE of the ARIMA models was summarized in Table 7, and the improvements against the OLS models were illustrated in Figure 12.Particularly for May and June when the RMSE of the OLS model were worst (0.344 in May and 0.514 in June), the ARIMA model shows considerable improvements, by 0.2 and 0.332, respectively.The average improvement of RMSE was 0.076, which reveals the effect of time-series modeling given that the range of sea ice concentration is between 0 and 1.In addition, the validation accuracies of the ARIMA were relatively stable throughout the months: the standard deviation of monthly RMSE was 0.036 for the ARIMA, but 0.119 for the OLS model.However, the RMSE slightly worsened for two months (by 0.049 in July and by 0.098 in October), which might be recovered by using another time-series approach like vector autoregression.

Conclusions
In this paper, we described the statistical modeling of sea ice concentration in relation to climatic factors, using satellite imagery and climate reanalysis data for the Barents and Kara Seas during 1979-2012.The OLS regression model summarized the whole years and provided information about overall climatological characteristics in the relationships between sea ice concentration and climate variables.In particular, the ARIMA method was first introduced to statistical model for sea ice concentration and it helped improve prediction accuracies because the time series of the sea ice concentration is such a long-range dataset that the relationships may not be explained by a single equation of the OLS regression.We found that temporally varying relationships between sea ice concentration and the climate factors such as skin temperature, sea surface temperature, total column liquid water, total column water vapor, instantaneous moisture flux, and low cloud cover were modeled by the ARIMA method, which resulted in better prediction accuracies.The RMSE improvement was 0.076 on average (0.199 by OLS and 0.123 by ARIMA), and the prediction accuracies of the ARIMA were relatively stable throughout the months.Our improved statistical approach with the ARIMA method may be worth consideration when forecasting future sea ice concentration using the climate data provided by general circulation models (GCM).
In addition, some unique characteristics of the climate factors in relation to sea ice concentration were found during the analyses.In July and August when the ice melts, β  showed abrupt positive values presumably because rapid increases of skin temperature might act as a noise in the regression coefficients.During winter, β  and β  showed relatively similar values close to zero partly because of the weaker thermodynamic effects of skin temperature and sea surface temperature below the freezing point.Unlike the total column water vapor, the total column liquid water (in clouds) brought about a peculiar seasonal pattern of   because the warming or cooling effects of clouds were applied differently according to season, which will require further investigations to understand the details.Those results may be useful in better understanding the physical mechanism and in improving the statistical model.
Our result derived from limited number of explanatory variables may not be applied universally when considering the complex climate change system.Therefore, additional explanatory variables related to solar radiation [47], atmospheric refractivity [13], and surface roughness [48] should be incorporated in the statistical models for further improvements in prediction accuracies.In addition, we did not identify the time-lag between explanatory variables and sea ice concentration, but given that previous studies have reported the time-lag in melt onset or freeze-up [46], a closer examination of this will be necessary for improving the time-series modeling.

Figure 1 .
Figure 1.The regional mask around the Arctic provided by the National Snow and Ice Data Center (NSIDC).It includes Arctic Ocean, Barents and Kara Seas, Greenland Sea, Baffin Bay/Davis Strait/Labrador Sea, Gulf of St. Lawrence, Hudson Bay, Canadian Archipelago, Bering Sea, and Sea of Okhotsk.

Figure 2 .
Figure 2. Monthly mean skin temperature (SKT) and sea surface temperature (SST) during 1979-2011.The values for the entire pixels were aggregated.

Figure 3 .
Figure 3. Monthly changes of regression coefficients for skin temperature (SKT) and sea surface temperature (SST).They were calculated from the averages of the normalized variables during 1979-2011.

Figure 4 .
Figure 4. Monthly mean total column liquid water (TCLW) and total column water vapor (TCWV) during 1979-2011.The values for the entire pixels were aggregated.

Figure 5 .
Figure 5. Monthly changes of regression coefficients for the total column liquid water (TCLW) and the total column water vapor (TCWV) during 1979-2011.

188 Figure 6 .
Figure 6.Satellite-observed monthly sea ice concentration in the Barents and Kara Seas in 2012.

Figure 7 .
Figure 7. Monthly sea ice concentration predicted by the OLS regression models for the Barents and Kara Seas in 2012.

Figure 8 .
Figure 8. Prediction errors of the OLS regression models for the Barents and Kara Seas in 2012.

Figure 9 .
Figure 9. Conceptual framework of the ARIMA models to incorporate temporally varying relationships between sea ice concentration and climate variables.

Figure 10 .
Figure 10.Monthly sea ice concentration predicted by the ARIMA models for the Barents and Kara Seas in 2012.

Figure 11 .
Figure 11.Prediction errors of the ARIMA regression models for the Barents and Kara Seas in 2012.

Figure 12 .
Figure 12.Prediction accuracy improvements by the ARIMA models for Barents and Kara Seas in 2012.

Table 1
+    +    +    +    +    (3) where the response variable  denotes sea ice concentration, and the explanatory variables , The p-values of the regression coefficients (  ,   ,   ,   ,   , and   ) produced by t-test were <0.01 for almost all cases ( , , , , and  indicate skin temperature, sea surface temperature, total column liquid water, total column water vapor, instantaneous moisture flux, and low cloud cover, respectively.In order to examine seasonal characteristics of the relationships between sea ice concentration and six climate variables, the above equation was built into 12 different versions for each month.Our calibration and validation schemes employed leave-one-out method for the year 2012, 2002, 1992, and 1982: (1) calibration of 33-year dataset except 2012 (for validation of 2012); (2) calibration of 33 year dataset except 2002 (for validation of 2002); (3) calibration of 33-year dataset except 1992 (for validation of 1992); and (4) calibration of 33-year dataset except 1982 (for validation of 1982).First, monthly means during 1979-2011 were aggregated for each pixel to validate the OLS model for 2012.

Table 1 .
). Coefficients of the OLS regression models calibrated during 1979-2011. 0 denotes intercept, and   ,   ,   ,   ,   , and   are the coefficients of skin temperature, sea surface temperature, total column liquid water, total column water vapor, instantaneous moisture flux, and low cloud cover, respectively.

Table 2 .
R 2 in the OLS regression models calibrated for each month during 1979-2011.

Table 3 .
Pearson correlation coefficients (R) of the six explanatory variables for the sea ice concentration.They were calculated from the averages of the normalized variables during 1979-2011.

Table 4 .
Validation of monthly sea ice concentration in 2012 using the OLS regression models calibrated for each month during 1979-2011.The values for the entire pixels were aggregated.

Table 5 .
R 2 in the OLS regression models calibrated for validating the monthly sea ice concentration in2002, 1992, and 1982.

Table 6 .
Validations of sea ice concentration in 2002, 1992, and 1982 using the OLS regression models.The values for the entire pixels were aggregated.

Table 7 .
RMSE improvements of monthly sea ice concentration in 2012 using the ARIMA models calibrated by the time series for each pixel.The improvements were calculated by subtracting the RMSE of ARIMA model from that of OLS model in Table4.The values for the entire pixels were aggregated.