1. Introduction
Sea ice is a key component of the Earth system. Its extent undergoes annual periodicity and covers almost 7% of the Earth’s surface at its maximum extent in September, mainly at the poles. It has a crucial effect on climate change, as it participates in the so-called ice–albedo feedback loop [
1]. Arctic sea ice has been significantly decreasing in recent decades [
2], also suffering a shift to younger and thinner ice [
3].
Since the launch of the first Earth Observation (EO) satellites in the early 1970s, passive microwave radiometers have been used to observe sea ice. The European Space Agency (ESA) mission Soil Moisture and Ocean Salinity (SMOS) [
4,
5], initiated in 2009, and NASA’s Advanced Microwave Scanning Radiometer (AMSR) series, which has operated on different satellites from 2002 to the present, as well as the Soil Moisture Active Passive (SMAP) [
6] mission, launched in 2015, are key contributors. In particular, the focus of this study is on passive L-band radiometers such as SMOS, working at 1.4 GHz with a penetration depth of up to 1 m or even more in low-salinity sea ice.
Currently, two distinct algorithms for estimating Arctic thin sea ice thickness from SMOS measurements during the non-melting period (October to April) exist. The first, distributed by the European Space Agency (ESA), is a semi-empirical algorithm developed by the Alfred Wegener Institute (AWI) as described in [
7,
8]. However, a significant drawback is its inability to consider the presence of snow on top of sea ice. This limitation is critical due to the substantial impact of snow on emitted L-band radiation [
9] and subsequent thickness retrieval processes. The second algorithm, from the University of Bremen (UB) and detailed in [
10], adopts an empirical approach. Despite this, a notable limitation of the UB product is its restricted sensitivity, reaching only up to 0.5 m. While the UB product explicitly limits the sea ice thickness retrieval, the AWI product does not have a clearly defined upper limit, with sensitivity typically ranging between 0.5 and 1.5 m depending on sea ice conditions.
Other emerging methodologies include the use of Global Navigation Satellite System Reflectometry (GNSS-R), which, beyond its previous applications for retrieving various geophysical parameters, has also been explored for sea ice applications. Recent studies have shown its potential to detect sea ice and estimate its thickness, although these results have so far only been demonstrated in simulations [
11,
12]. Nevertheless, GNSS-R remains promising as a valuable complement to passive L-band measurements in the future.
Artificial intelligence (AI) holds the potential to transform the processing and analysis of Earth Observation (EO) data acquired through remote sensing. AI offers the capability to improve the analysis of these data, enhancing our ability to monitor and predict the evolution of many geophysical variables. Recent efforts to retrieve sea ice parameters from satellite observations using artificial intelligence have increasingly focused on applying machine learning techniques to passive microwave data. One of the earlier examples is found in [
13], where a neural network was used to try to estimate snow depth over sea ice using AMSR2 and SMOS data. Similarly, ref. [
14] applied two machine learning algorithms to retrieve sea ice thickness from a combination of TechDemoSat-1 (TDS-1) and SMOS observations. Ref. [
15] developed an ensemble Convolutional Neural Network (CNN) to retrieve daily sea ice thickness from AMSR2 data. Likewise, ref. [
16,
17] used a combination of frequencies ranging from 1.4 to 36 GHz, applying AI-based methodologies to estimate and analyze sea ice thickness and volume trends. Beyond SMOS and AMSR2, ref. [
18] used a neural network to retrieve sea ice thickness from FSSCat nanosatellite data [
19]. Meanwhile, ref. [
20,
21] investigated machine learning approaches for sea ice sensing using data from a C-band Synthetic Aperture Radar (SAR) sensor like Sentinel-1.
An application of machine learning to Arctic sea ice thickness retrieval from SMOS observations was presented in [
22]. That study employed two decision-tree-based algorithms—Random Forest (RF) and Gradient Boosting (GB)—within a supervised learning framework. These algorithms were trained on data derived from maps generated through model inversion. The results were promising, demonstrating good agreement with ESA’s sea ice thickness product and achieving improved validation against in situ datasets. Building on this foundation, the objective of this work is to assess the performance of deep learning methods in enhancing retrieval accuracy. Specifically, it extends the methodology by implementing a Convolutional Neural Network (CNN) and a Long Short-Term Memory (LSTM) neural network. These approaches represent significant advancements in data processing: the CNN leverages spatial coherence by treating input variables as maps, while the LSTM captures temporal coherence by utilizing sequential information from consecutive observations.
Since the training dataset is entirely derived from a physical emission model due to the lack of sufficient in situ data, the performance of the ML algorithms is ultimately constrained by the accuracy and assumptions of that model. Therefore, this study focuses on evaluating the ability of different architectures to learn and approximate its behavior. To assess their performance, the operational ESA sea ice thickness product is included as a baseline for comparison. Accordingly, the main contributions of this work are threefold: (1) a comparative evaluation of three ML architectures (RF, CNN, LSTM) for retrieving sea ice thickness from L-band radiometry; (2) an assessment of how these approximations perform on real-world satellite data; and (3) a benchmark comparison against the ESA product to highlight strengths and limitations relative to existing retrieval approaches.
The paper is organized as follows:
Section 2 describes the various data sources used as input, the generation of the training dataset, and the datasets employed for validation.
Section 3 outlines the different algorithmic approaches. The results of the assessment are presented in
Section 4, followed by a discussion in
Section 5. Finally, the conclusions of the study are summarized in
Section 6.
3. Methodology
This study introduces approaches to sea ice thickness estimation that incorporate both spatial and temporal coherence into the modeling process. Spatial coherence is achieved by utilizing maps as input data, allowing the influence of neighboring pixels to be considered, while temporal coherence is addressed by including data from the previous day to enhance predictions of current conditions. The primary objective is to evaluate and determine the most effective approach among the proposed methods, leaving uncertainty computation as a focus for future work. In previous research [
22], Random Forest (RF) and Gradient Boosting (GB) algorithms were trained using a pixel-by-pixel methodology. Although these models demonstrated strong agreement with ESA’s sea ice thickness product and showed improved correlation with reduced error when validated against in situ datasets, this approach inherently breaks the spatial and temporal continuity present in the original data. In reality, the data exhibit clear spatial and temporal dependencies, and failing to account for them may limit the physical consistency of the predictions. Therefore, this study addresses a necessary step: evaluating whether preserving spatial coherence (as with the CNN) or temporal continuity (as with the LSTM) leads to more physically meaningful and accurate sea ice thickness retrievals. For the three proposed approaches, the input variables are prepared as detailed in
Section 2.2, with some specific modifications for each approach that are described below. It is important to note that although sea ice concentration (SIC) is indeed a major source of uncertainty in sea ice thickness retrievals from L-band radiometry, it is not explicitly included in this assessment, as the comparison is restricted to locations where the ESA product provides valid thickness values. This product assumes 100% SIC, justified by the fact that most ice-covered areas in winter have SIC > 90%, and that uncertainties in SIC products could introduce larger errors than the underestimation caused by assuming full ice cover [
7,
8]. Therefore, to ensure a fair comparison, no SIC correction is applied in this study.
Three different machine learning algorithms are evaluated, but they all rely on the same methodology and use the same input and output variables, which are described in
Section 2.1 and summarized in
Figure 1. Specifically, the four input variables are SMOS brightness temperature, sea ice temperature, sea ice salinity, and snow presence. The output variable is sea ice thickness, which, during the training phase, corresponds to the values derived from the inversion of the emission model, as detailed in
Section 2.2.
3.1. Pixel-by-Pixel Approach: Random Forest and Gradient Boosting
Focusing on supervised learning, specifically within the realm of regression methods, the machine learning algorithms chosen for this task include Random Forest (RF, [
39]) and Gradient Boosting (GB, [
40]). These algorithms serve as more robust extensions of decision trees, sharing numerous similarities but diverging in their approaches to tree construction and combination. Both algorithms are configured with 50 estimators, a reasonable choice after experimenting with other options ranging from 10 to 1000. Following thorough testing, no further hyperparameters are fine-tuned as there is no discernible improvement in the results. Individual data points are extracted from comprehensive Arctic sea ice thickness distribution maps and fed into these algorithms for training. Therefore, this approach is named pixel-by-pixel, as training is performed considering individual values within the distribution. The training dataset is composed of 100,000 pixels extracted from the maps of each day, after removing the duplicates, from 15 October 2019 to 15 April 2020 and from 15 October 2020 to 15 April 2021. Both RF and GB algorithms provide similar results, but in this work, the RF is selected to be assessed with the other approaches. The use of a decision tree-based method enables an assessment of feature importance in determining the algorithm’s output. In this specific case, the feature importances are 94.13%, 4.33%, 1.11%, and 0.43% for SMOS TB, sea ice temperature, sea ice salinity, and snow presence, respectively. The TB is the dominant factor influencing the results, which aligns with its high variability compared to the other inputs. Sea ice salinity tends to stabilize around a value of 5 after the initial freeze-up [
41], and snow presence exhibits limited variability, as a snow layer is present in almost all pixels. Nevertheless, the inclusion of snow presence remains crucial, as it has a clear impact on the measured TB [
9].
3.2. Spatially Coherent Approach: Convolutional Neural Network
Convolutional Neural Networks (CNNs; [
42]) are a class of deep learning neural networks primarily designed to process and analyze visual data. In this work, we aim to leverage their potential to deal with images and introducing spatial coherence into the algorithm. Sea ice pixels are not independent of their neighbors; similar conditions should lead to similar thickness distributions. All variables in the selected grid are arrays with dimensions of 896 × 608. The convolutional layers use 3 × 3 kernels with padding to preserve the dimensional integrity of the variables throughout the network architecture.
After hyperparameter tuning, the batch size was fixed at 64 with 200 training epochs. The optimal combination of layers and filters for each convolution is presented in
Table 2. Each convolutional layer contains 6, 12, or 24 filters of size 3 × 3 applied to the input image. The ReLu activation function introduces non-linearity, and when combined with “same” padding, ensures that the spatial dimensions of the output feature maps remain identical to those of the input. Batch normalization is applied after each convolutional layer, normalizing the activations from the previous layer to help stabilize and accelerate the training process. The final convolutional layer contains a single filter of size 3 × 3. The linear activation function indicates that no activation function is applied to the output, making this layer perform essentially a linear transformation. The training dataset is conformed by 181 maps extracted from two periods: 15 October 2019 to 15 April 2020, and 15 October 2020 to 15 April 2021.
3.3. Temporally Coherent Approach: Long Short-Term Memory Neural Networks
Despite the robust performance that can be achieved with the CNN by including its inherent spatial coherence, the temporal coherence should be explored. Therefore, the Long Short-Term Memory (LSTM; [
43]) neural network structure is implemented. LSTMs are a specialized type of recurrent neural network (RNN) designed to effectively capture and learn temporal dependencies in sequential data by incorporating a memory cell. Intuitively, sea ice evolution has a clear temporal component; the present day prediction should be linked to previous day conditions. Therefore, for this approach, the dataset is shifted in order to predict the sea ice thickness distribution accounting for the previous day information. However, since the input for this approach is 1D, this methodology works pixel-by-pixel, so no spatial component is included.
The architecture consists of three LSTM layers, summarized in
Table 3. All of these layers use a hyperbolic tangent activation function and a sigmoid recurrent activation function to control the flow of information through the cell state. The output of these layers is passed as a sequence to the next layer, except for the last layer, which returns only the final output of the sequence. A final dense layer with a single unit and a linear activation function is applied to produce the model’s prediction. After testing, the optimal batch size is set to 2, with 50 training epochs. The training dataset is the same as that described for the CNN algorithm: 181 maps from 2019 and 2020, but with the temporal shift as previously mentioned.
4. Results
To evaluate the different approaches, validation using in situ data is performed. Specifically, we use the two datasets described in
Section 2.3: the BGEP moorings and the ESA SMOSice campaign data are utilized. Furthermore, the ESA/AWI product described in
Section 2.1.1 is also included in the assessment as a baseline. To ensure a fair assessment, all algorithm predictions exceeding 1 m are capped at exactly 1 m. The selected metrics are the correlation coefficient (R
2), the mean absolute error (MAE), the standard deviation (Std Dev), and the overestimation and underestimation percentages. These percentages are calculated by identifying a value as overestimated or underestimated if it exceeds or falls short of the ground truth value by at least 25%.
To provide an initial visual assessment of each algorithm,
Figure 2 presents Arctic-wide and Barents Sea-focused sea ice thickness maps produced by each algorithm. In these maps, the areas covered by the validation datasets are marked. All four approaches capture the broad spatial patterns of SIT, with thicker ice concentrated north of Greenland and the Canadian Arctic Archipelago, and thinner ice in the marginal zones. The ESA product shows relatively smooth gradients and generally higher SIT values in the marginal ice zone compared to the ML methods. The RF exhibits sharper spatial transitions and slightly noisier patterns, particularly near the ice edge. The CNN output appears more smooth than RF, yet preserves fine-scale spatial variability, especially in the Barents Sea. The LSTM model produces the most homogeneous fields, with reduced spatial detail, likely reflecting the influence of temporal averaging or memory in the algorithm’s architecture. Overall, the ESA, RF, and LSTM algorithms produce broadly similar sea ice thickness distributions, with local differences that become more evident upon closer inspection, such as in the Barents Sea region. In contrast, the CNN model tends to predict higher thickness values in several areas and shows limited sensitivity to the finer spatial structures associated with thinner ice. This characteristic can significantly affect the model’s performance, particularly given that the primary objective of the methodology is to retrieve thin sea ice thickness. Consequently, such behavior may also influence the outcomes of the validation process.
Figure 3 shows the results of comparing the predictions given by the different algorithms, using the BGEP moorings dataset described in
Section 2.3. The points are complemented by their respective density to depict the distribution of output values. However, since many data points correspond to 1 m after the imposed limitation, they are not included in the density computation to improve the visualization throughout the thickness range. The RF algorithm presents similar results to ESA, while the CNN tends to overestimate, and the LSTM shows more dispersion and underestimation, especially for thinner ice. The most homogeneous point-density distribution is presented by the RF, highlighting its robustness. The metrics are provided in
Table 4, where the initially observed characteristics of each algorithm are confirmed. Although all approaches show similar error, correlation, and dispersion, it is clear that the CNN is overestimating, by approximately 10% more than the others. These results suggest that for this dataset, both the RF and ESA algorithms slightly outperform the others. Therefore, the spatially and temporally coherent approaches, represented by the CNN and LSTM algorithms, do not provide better capture of the sea ice evolution depicted by the BGEP moorings.
The ESA SMOSice campaign provides a unique dataset that contains sufficient thin sea ice to perform validation. In fact, this dataset was used to validate the ESA SMOS sea ice thickness product described in
Section 2.1.1. A similar assessment to that conducted for the BGEP moorings dataset is conducted, using the three sub-datasets that were collected: HEM, SEM, and ALS.
Figure 4 shows the validation of the assessed algorithms against these in situ datasets. The different sub-datasets present distinct SIT distributions, allowing for a comprehensive assessment of the algorithm performance. For the HEM data, which cover a broader SIT range, the RF and CNN models show the best agreement with the reference measurements, followed closely by ESA, while the LSTM consistently underestimates SIT across the range. In the case of SEM, which measured very thin ice, none of the algorithms are capable of reproducing the observed values, since predictions are consistently too dispersed. For the ALS dataset, which includes thicker ice, all algorithms tend to underestimate SIT, although the CNN performs better, showing the least bias. It is remarkable that the LSTM shows negative thicknesses, which is physically unrealistic, but this is due to the inherent stochasticity of the algorithm’s architecture.
Table 5 summarizes the validation performance of the four algorithms across the three datasets: HEM, SEM, and ALS. In the HEM dataset, the CNN achieved the highest R
2 (0.51) and lowest MAE (0.30 m), while the ESA and RF models followed closely with comparable MAE values and moderately lower R
2 scores. The LSTM performed similarly in terms of MAE but displayed a lower R
2, indicating slightly less correlation with the observed data. For the SEM dataset, performance dropped across all algorithms, with R
2 values remaining below 0.25, reflecting the increased difficulty in modeling this dataset. The CNN again had the highest R
2 (0.23), but RF and ESA maintained lower MAE values, suggesting a better trade-off between error magnitude and model consistency. Regarding the ALS dataset, all models improved substantially, with the LSTM and RF showing stronger R
2 values (0.59 and 0.57, respectively) and lower MAE (0.19–0.22 m), with the CNN and ESA being slightly worse.
5. Discussion
The results presented in
Section 4 suggest that the two in situ datasets involved in the validation have some disparities regarding their thickness distributions. Therefore, a thorough analysis is required to avoid divergent conclusions.
Regarding the BGEP moorings dataset,
Figure 5 shows the temporal evolution from the in situ mooring measurements and from the predictions of the algorithms. Here, the algorithm’s predictions are not limited to 1 m since no metric computation is involved, but they are graphically separated. It is also noteworthy that the unrealistic data jumps occur because only the BGEP-measured values from the initial growth phase (lower than 1 m) are considered for each period, i.e., from October to April of the following year, spanning from 2010 to 2021.
From this representation, it is clear that the majority of the error occurs because the slow growth observed by the moored buoys is not successfully captured by any algorithm. Except for some specific periods, the algorithms rapidly transition from very thin to already grown ice. However, this effect is not unexpected because of the resolution disparities between the in situ and remote sensing observations. This can be explained by the fact that these buoys represent single points within one vast satellite pixel, so there can exist huge thickness variability within them. Furthermore, these moorings cannot be considered as stable platforms, since they are affected by physical processes such as ice drifting and ocean currents underneath. Despite all these drawbacks, and since the validation is performed equally for all the algorithms, one can say that the CNN is significantly overestimating, especially from 0.5 m onward. For the other approaches, both
Figure 3 and
Figure 5 show similar behavior for all of them.
The validation using the ESA SMOSice campaign dataset highlights a systematic underestimation of sea ice thickness (SIT), particularly in the thin ice regime, with varying levels of accuracy and bias across the assessed algorithms. It is important to note that all datasets were interpolated onto the SMOS grid, facilitating intercomparison among products. However, none of the algorithms could reproduce the SEM measurements, which suggests this dataset may have limited reliability, and results based on it should be interpreted with caution.
In general, ESA estimates exhibited lower dispersion and smaller errors, while the RF algorithm delivered a balanced performance across all datasets. The CNN showed higher correlation with in situ data but also slightly increased error and variability. The LSTM generally underperformed, although it yielded better results under more stable conditions, such as those represented by the ALS dataset.
The pixel-by-pixel approach of the RF algorithm performs comparably to that of ESA, showing similar validation metrics across all datasets. Both exhibit strong correlations and reasonable levels of error and dispersion, despite their relative simplicity in terms of algorithmic complexity, as neither explicitly incorporates spatial or temporal coherence. However, they differ in how sea ice emission is modeled. Although a detailed discussion of emission modeling is beyond the scope of this work, it is worth noting that the RF algorithm—and the overall methodology presented—can be readily adapted to any desired emission model. This flexibility stems from the fact that modifying the emission modeling would only require regenerating the training dataset using the new forward model. With the new dataset produced, the rest of the methodology would remain unchanged, underscoring its robustness and adaptability.
In contrast, the CNN tends to overestimate sea ice thickness across the validation datasets and shows the highest dispersion. This may be due to the fact that the input variables are already smooth and spatially coherent (see Figure 2 in [
22]), which reduces the need for additional spatial modeling. This explains why the simpler pixel-wise methods perform well, as spatial coherence is already embedded in the inputs, making complex architectures like CNNs less advantageous in this context. Similarly, the temporally coherent approach of the LSTM does not appear to improve the estimation of sea ice thickness. One possible explanation is that local sea ice inhomogeneities evolve too rapidly to be captured at the resolution of satellite observations. Additionally, these small-scale variations are effectively averaged out, meaning that the temporal evolution of each pixel is smoother and can be captured by simpler models, reducing the need for complex temporal modeling techniques such as LSTMs.
Regarding the computational resources needed for each assessed algorithm, there are important differences among them. Starting with the ESA product, although no information is published, the description in [
8] includes the simulation of a thermodynamic model, which can easily increase the computational requirements. Since the RF is a simple ML algorithm, the computation time is negligible, both for training and predicting. However, since the CNN and the LSTM are deep learning approaches, the computational resources are significantly increased. However, the procedures involved in the presented algorithms are in no way limiting for a hypothetical operational product. The order of magnitude of the time required for the training phase of the algorithms ranges from seconds for the RF, to hours for both the CNN and the LSTM. To perform predictions, the RF needs nanoseconds while the others need seconds. No specific metrics are provided since the computational resources are not an impediment, and thus, there is no need to include high performance computing metrics or similar information.
Finally, it is worth mentioning that aside from the algorithms presented here, similar variants have been tested within this study. Regarding the pixel-by-pixel approach, aside from the mentioned RF and GB, the widely-used XGBoost [
44] was tested without achieving better results. For the spatially coherent approach, the U-Net structure [
45] was also used as a variant of the standard CNN, but again no significant improvements were found. Moreover, a fourth approach was attempted for the sake of completeness, accounting for both the spatial and temporal coherence. This was carried out by using a 2D-CNNLSTM structure, combining the two already presented methods. However, despite the important increase in computational resources required for the training phase, the results were not better than those obtained by using the standard CNN or LSTM.
6. Conclusions
This study evaluates various approaches for retrieving thin sea ice thickness from L-band radiometry observations, including a machine learning framework, two deep learning algorithms, and the ESA official product as a baseline. The findings highlight that there is still significant potential to refine the current algorithms used for sea ice thickness retrieval from SMOS. These refinements are also critical for the upcoming ESA’s Copernicus Imaging Microwave Radiometer (CIMR) mission, which will expand observations to higher frequencies (up to 36.5 GHz) while retaining 1.4 GHz capabilities.
Overall, the ESA product has demonstrated robustness despite known limitations, while the novel methodologies show potential advantages, offering opportunities to optimize computational efficiency and improve the retrieval.
Among the assessed methods, the RF algorithm shows balanced performance, comparable to the ESA’s product in terms of correlation, error, and dispersion, despite being much simpler and not modeling spatial or temporal coherence explicitly. This performance, coupled with its flexibility in adapting to different emission models, underscores the robustness and adaptability of the RF approach. In contrast, the CNN tends to overestimate SIT and exhibits greater dispersion, suggesting that the spatial coherence already embedded in the input variables diminishes the added value of the CNN’s architectural complexity. Similarly, the LSTM algorithm does not significantly improve results, likely due to the rapid evolution and spatial averaging of sea ice conditions that make temporal coherence less impactful at the satellite scale.
Based on these findings, the assessment suggests that while the ESA SMOS sea ice thickness retrieval algorithm is still a valid approach, there is still potential for improvement. The proposed pixel-by-pixel approach, represented by the RF algorithm, shows robust performance and could serve as a solid foundation for future operational products. In comparison, the more complex structures such as the CNN and the LSTM do not offer clear advantages in this case. Therefore, future work should not focus exclusively on developing more complex methodologies, but also on improving the modeling of sea ice emission at L-band. This improvement could be directly applied to the presented method to further increase the accuracy of the sea ice thickness estimates.