Applying U-Net for Estimating AVHRR-Based Snow Cover Fraction (ESA CCI+ Snow) During Cloud Cover and Polar Night in Scandinavia

Jakob, Fabio; Neuhaus, Christoph; Wunderle, Stefan

doi:10.3390/rs18122030

Open AccessArticle

Applying U-Net for Estimating AVHRR-Based Snow Cover Fraction (ESA CCI+ Snow) During Cloud Cover and Polar Night in Scandinavia

by

Fabio Jakob

^1,*,

Christoph Neuhaus

^1,2 and

Stefan Wunderle

^1,2

¹

Institute of Geography, University of Bern, Hallerstrasse 12, 3012 Bern, Switzerland

²

Oeschger Center for Climate Change Research (OCCR), University of Bern, Hochschulstrasse 4, 3012 Bern, Switzerland

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(12), 2030; https://doi.org/10.3390/rs18122030

Submission received: 14 April 2026 / Revised: 15 June 2026 / Accepted: 17 June 2026 / Published: 18 June 2026

Download

Browse Figures

Review Reports Versions Notes

Highlights

What are the main findings?

A U-Net trained on a single year reconstructs AVHRR-based snow cover fraction (SCF) across Scandinavia with R² = 0.9342 and RMSE = 0.1127, outperforming spatial, physical, and pixel-wise machine learning baselines.
Independent ground station validation yields 86.7% accuracy and F1 = 88.0%, matching the quality of the ESA CCI L3C SCFV AVHRR v4.0 product in real observational gaps.

What are the implications of the main findings?

Physically meaningful predictors (snow water equivalent, temperature, elevation, land cover) enable continuous, cloud- and polar-night-robust SCF reconstruction without concurrent optical observations.
The framework represents a promising first step towards extending SCF reconstruction to the full 1979–2023 ESA CCI AVHRR SCF, though transferability to other regions and time periods requires explicit validation before broader application can be attempted.

Abstract

Snow cover fraction (SCF) records derived from optical satellite sensors such as AVHRR are systematically interrupted by cloud contamination and polar night conditions, leaving large spatiotemporal data gaps that limit their utility for climate and hydrological applications. This study presents a U-Net–based deep learning framework for reconstructing missing SCF values in Scandinavia over a 15-year period (2000–2014), using the ESA CCI L3C SCFV AVHRR v4.0 product as both partial input and training target. The model integrates physically meaningful auxiliary predictors (snow water equivalent (SWE), near-surface air temperature, elevation, and land cover) harmonized to a common 0.05° grid, enabling reconstruction in the complete absence of concurrent optical observations. Trained on a single year with extensive synthetic masking (91.5% of valid SCF pixels withheld), the U-Net achieves an R² of 0.9342 and RMSE of 0.1127, outperforming spatial interpolation, a SWE-based physical baseline, and pixel-wise machine learning baselines. Feature importance analysis confirms that SWE and temperature dominate predictive skill, with the observed SCF input contributing negligibly. Independent validation against ground station observations yields 86.7% binary classification accuracy and an F1 score of 88.0%, comparable to the 87.8% accuracy of the original satellite retrievals, demonstrating the viability of deep learning–based gap-filling for producing continuous SCF records under cloud cover and polar night.

Keywords:

snow cover fraction; U-Net; AVHRR; deep learning; gap-filling; cloud contamination; polar night; Scandinavia

1. Introduction

Snow cover is a key component of the Earth system, exerting strong control on surface energy balance, hydrological processes, and land–atmosphere interactions, particularly in high-latitude regions [1,2,3]. Through its high albedo and insulating properties, seasonal snow influences soil temperature, permafrost, runoff timing, and ecosystem dynamics, making accurate snow cover fraction (SCF) information essential for climate monitoring, hydrology, and environmental modeling applications [4,5,6,7,8]. Remote sensing provides an effective means of monitoring snow cover across large spatial and temporal scales [9]. Optical satellite sensors are especially valuable for snow detection due to the distinct spectral signature of snow in visible and shortwave infrared wavelengths [10]. The ESA Climate Change Initiative (CCI) Snow project has produced one of the longest satellite-based SCF records available, providing daily fractional snow cover estimates derived from AVHRR visible-channel reflectances from 1979 to 2023, making it a key dataset for long-term cryospheric monitoring and climate trend analysis [11]. However, reliable SCF retrieval from optical satellite imagery remains challenging. Cloud contamination frequently obscures the surface and leads to large spatial and temporal data gaps, particularly in regions with persistent cloud cover, such as northern Europe [12,13]. In addition, polar night conditions at high latitudes prevent optical observations during extended winter periods, further limiting data availability [14,15]. These factors result in incomplete SCF time series that constrain their usability for applications requiring continuous coverage. To address these limitations, a range of gap-filling strategies has been developed for satellite-derived snow cover products [15]. The simplest approaches apply temporal or spatial filters, replacing cloudy pixels with values from nearby days or surrounding clear pixels [11,16]. While computationally efficient, these methods struggle to capture rapid snowmelt or accumulation events and perform poorly under extended cloud cover. Data fusion frameworks extend this further by integrating passive microwave observations, which are unaffected by clouds, with optical retrievals, often incorporating ancillary predictors such as topography and land cover to improve accuracy [17,18]. Physically based and temperature-index snow models offer an alternative pathway by simulating snowpack variables continuously from meteorological forcing data, thereby generating snow cover estimates that are inherently unaffected by cloud cover or polar night conditions, though their accuracy depends heavily on the quality of input forcing and model parameterization [19,20]. Recent machine learning approaches including random forests, convolutional neural networks, and generalized partial convolution methods have shown strong performance for SCF gap-filling by learning complex nonlinear relationships from reflectance, terrain, and meteorological inputs [21,22,23,24]. Despite these advances, most existing deep learning approaches treat gap-filling as a purely data-driven interpolation problem without explicitly incorporating physically related state variables. However, comparatively few studies have applied deep learning to reconstruct SCF under persistent cloud cover and polar night by explicitly leveraging physically related predictor variables such as SWE [15].

SWE is closely linked to snow accumulation and ablation processes and therefore provides valuable complementary information for SCF estimation [25]. In addition, topographic information derived from a digital elevation model, land cover, and near-surface temperature are known to exert strong controls on snow distribution and persistence [26,27,28]. Integrating these predictors within a deep learning framework offers a promising pathway for improving SCF reconstruction in data-sparse regions [15]. In this study, we present a deep learning–based framework for reconstructing spatially and temporally incomplete SCF observations in high-latitude environments. Focusing on Scandinavia as a representative study region, we employ a U-Net architecture to infer missing SCF values caused by cloud cover and polar night conditions. The model is trained using partially observed ESA CCI L3C SCFV AVHRR v4.0 data together with auxiliary predictors, including snow water equivalent (SWE), land cover, topography, and near-surface temperature, which are harmonized across differing spatial resolutions. Unlike conventional interpolation or rule-based gap-filling methods, the proposed approach learns spatial and temporal snow patterns directly from the data, enabling realistic SCF reconstruction even in regions and periods with little or no optical information [15]. Model performance is evaluated against independent reference datasets, with emphasis on seasonal behavior and wintertime conditions, when observational uncertainty is highest. By generating a spatially and temporally continuous extension of the ESA CCI SCFV AVHRR record, this work contributes to improving snow monitoring capabilities in regions affected by persistent cloud cover and polar night. The results demonstrate the potential of deep learning–based reconstruction methods to complement existing operational snow products and to support applications requiring consistent and spatially detailed snow cover information. More broadly, this study highlights the value of combining physically meaningful predictors with modern deep learning architectures to overcome long-standing observational limitations in cryospheric remote sensing.

2. Data and Methods

2.1. Study Area and Data

The study area encompasses Scandinavia (Norway, Sweden, Finland, and Denmark), a region characterized by substantial seasonal snow cover variability, complex terrain, and diverse land cover, including boreal forest, alpine tundra, and coastal lowlands. Scandinavia is particularly suited as a testbed for snow cover fraction (SCF) reconstruction owing to the frequent and persistent cloud contamination affecting optical remote sensing retrievals in this region, creating large data gaps [13] that the proposed method aims to fill. The latitudinal range spans approximately 55°N to 71°N, covering both maritime-influenced western coasts and more continental eastern areas, as well as the low-lying Danish peninsula and archipelago in the south, resulting in a wide spectrum of snow accumulation and melt dynamics that challenge simple empirical gap-filling approaches. Five predictor variables (including SCF, which serves a dual role as partial input and reconstruction target) were used in this study (Table 1).

The ESA CCI L3C SCFV AVHRR v4.0 product [29], provided at 0.05° spatial resolution and daily temporal resolution, serves a dual role in this framework. Where valid observations are available, it acts as a partial input to the model, providing observational context alongside the auxiliary predictors. During training, pixels with valid observations that are synthetically masked are withheld from the input and used as supervised reconstruction targets, allowing the model to learn from known values. At inference time, the model reconstructs genuinely missing pixels caused by cloud cover or polar night, where no optical observation is available.

SWE was obtained from the ESA CCI L3C SWE SSMIS DMSP v3.1 product [29], a passive microwave-based daily estimate unaffected by cloud cover. ERA5-Land 2 m air temperature reanalysis [30] was included given its role in snow accumulation and melt processes. Land cover was derived from the ESA CCI LC L4 300 m P1Y product [31], providing annual static land cover classifications. Elevation was taken from the CGIAR-CSI SRTM 4.1 dataset [32]. To harmonize all input data to the 0.05° SCF target grid, dataset-specific resampling strategies were applied. ERA5-Land temperature fields, originally at 0.1° resolution, were upsampled using bilinear interpolation to preserve spatial gradients. SWE, also at 0.1° resolution, was upsampled using nearest-neighbor resampling, which avoids the introduction of non-physical intermediate values in this retrievable physical quantity and preserves the spatial structure of gaps within the SWE product, preventing the undesirable blending of valid retrievals into missing data areas that bilinear interpolation would introduce. While this approach may introduce blocky spatial artifacts at the 0.05° resolution, this trade-off was accepted as the U-Net’s convolutional operations are expected to smooth residual resampling discontinuities during feature extraction. The ESA CCI land cover product, at 300 m resolution, was downsampled to 0.05° using majority aggregation, ensuring that the dominant land cover class at each 0.05° grid cell is preserved without blending of categorical labels. The spatial distribution of all input predictor channels for an example date is shown in Figure 1.

The reconstructed SCF was validated against independent daily snow depth measurements from ground-based stations operated by the Norwegian Meteorological Institute (MET Norway) [33], the Finnish Meteorological Institute (FMI) [34], and the Swedish Meteorological and Hydrological Institute (SMHI) [35]. Together, these networks provide spatially distributed in situ observations across the Scandinavian domain, covering a range of climatic zones, elevation bands, and land cover types. Stations from Denmark were excluded as the region experiences no polar night and contains snow cover conditions less representative of the target reconstruction scenarios addressed in this study. The spatial distribution of stations and the number of daily observations per station over the inference period (2000–2014) are shown in Figure 2.

2.2. Data Preprocessing

A unified preprocessing pipeline was applied to all input years prior to model training. A pixel-level quality score was computed for each SCF observation to identify and suppress physically implausible retrievals. A plausibility score was derived by comparing each SCF value against a sigmoidal SWE-based expectation function (with slope parameter k = 4.0), applied to z-score-normalized SWE. A spatial trust score was computed as the complement of the absolute deviation of each pixel from a local rolling spatial mean (window = 10 pixels), serving as a fallback where SWE-based plausibility was unavailable. SCF observations with a quality score below 0.7 (17.2% of valid observations) were removed for the training process to ensure the model learns from reliable observations but retained for the final gap-filled dataset construction to preserve the completeness of the full ESA CCI L3C SCFV AVHRR v4.0 record. SWE, elevation and temperature input predictors were z-score normalized using statistics computed across the full multi-year dataset. Finally, the spatial domain was zero-padded symmetrically to the nearest multiple of 16 pixels in both dimensions to satisfy the downsampling requirements of the U-Net encoder. The land cover classification was remapped from the original ESA CCI scheme [31] to eight aggregated functional classes: forest, shrubland, cropland, sparse vegetation, flooded vegetation, urban, permanent snow and ice, and water bodies. These classes were subsequently encoded as a one-hot representation, yielding a multi-channel binary feature map that was concatenated with the continuous predictors. In addition, a melt proxy feature was derived as the element-wise product of SWE and a normalized temperature activation to explicitly encode the interaction between snowpack and thermal forcing relevant to melt processes (Figure 1f). The complete preprocessed dataset spans the years 2000–2014, covering the full Scandinavian domain at daily temporal resolution.

2.3. Model Architecture and Training

A chronological split was adopted to avoid temporal data leakage. The year 2012 was used exclusively for model training, and the year 2013 for validation. The complete 15-year period (2000–2014) was used for inference, allowing the trained model to reconstruct SCF for years not seen during training and to assess generalization across different snow seasons. During training, synthetic gap masks were applied to simulate data absence patterns arising from both cloud contamination and polar night conditions and to define supervised target pixels. Circular gap blobs with randomized sizes and counts were generated stochastically and applied persistently across groups of ten consecutive days, mimicking the temporal autocorrelation of observational gaps. Pixels that were simultaneously (i) covered by a synthetic gap mask and (ii) associated with a valid, high-quality SCF observation were designated as supervised pixels. The mask parameters were deliberately set to produce a highly challenging reconstruction task: across the training period, only 8.5% of known SCF pixels remained visible to the model on average (mean per-timestep visibility: 8.4% ± 10.6%), with the remainder withheld as supervised targets. This extensive masking ratio was chosen intentionally to replicate the persistent and spatially extensive data gaps caused by cloud cover and polar night in Scandinavia, where solar illumination is absent for prolonged periods and no optical SCF retrieval is possible. At supervised locations, the SCF input was replaced by a sentinel value, forcing the model to reconstruct SCF from the remaining ancillary predictors. To further improve robustness to missing SCF inputs at inference time, an SCF dropout strategy was additionally applied during training in which the entire SCF input channel was replaced with the sentinel value, forcing the model to rely solely on non-optical predictors. To provide short-term temporal context, one-day lag features were appended to the input tensor at each timestep, comprising the previous day’s SWE, temperature, and SCF as additional input channels.

A standard U-Net architecture [36] was employed for spatial SCF reconstruction, comprising a four-level encoder–decoder structure with skip connections and a 1024-channel bottleneck (Figure 3b). The encoder progressively extracts spatial features via convolutional blocks with GroupNorm normalization [37] and max-pooling, while the decoder reconstructs the full-resolution output through bilinear upsampling. The final layer projects to a single output logit, from which SCF estimates are obtained via sigmoid activation. The total number of input channels is determined by the predictor configuration: with SWE, temperature, elevation, one-hot land cover, SCF, the SWE–temperature melt proxy, and one lag day for SWE, temperature, and SCF, the input tensor contains 16 channels (Figure 3a).

The SCF distribution in the training data is strongly bimodal, dominated by near-zero (snow-free) and near-unity (fully snow-covered) observations, with relatively few samples in the intermediate range. To counteract this class imbalance, inverse-frequency bin weights were computed across ten uniformly spaced SCF bins. A composite loss function was used during training, combining a logit-space MSE term with a probability-space Huber term and an empirical bias correction. Let

\hat{z}

denote the predicted logit, y ∈ [0, 1] the true SCF, σ the sigmoid function, and

M

the set of supervised pixels. The overall loss is given by Equation (1):

L = α \cdot L_{logit} + (1 - α) \cdot L_{Huber} + λ_{bias} \cdot L_{bias}

(1)

with α = 0.35 and λ_bias = 3.0. The logit-space MSE term (Equation (2)) penalizes errors between the predicted logit and the logit-transformed target z*, defined as the logit of the true SCF clipped to (ε, 1 − ε) for numerical stability, weighted by bin weights w(y):

L_{logit} = \frac{\sum_{i \in M} w (y_{i}) {({\hat{z}}_{i} - z_{i}^{*})}^{2}}{\sum_{i \in M} w (y_{i})} .

(2)

The Huber term (Equation (3)) operates in probability space with δ = 0.3, providing robustness to outliers [38]:

L_{Huber} = \frac{\sum_{i \in M} w (y_{i}) \cdot h_{δ} (σ ({\hat{z}}_{i}) - y_{i})}{\sum_{i \in M} w (y_{i})},

(3)

where

h_{δ} (r) = \{\begin{matrix} \frac{1}{2} r^{2} & i f | r | \leq δ \\ δ (|r| - \frac{δ}{2}) & o t h e r w i s e \end{matrix}

(4)

To address systematic biases identified from an initial training run, an empirical bias correction term L_bias was incorporated into the composite loss. Stratified evaluation of the initial model revealed consistent over-prediction in the 0.4–0.6 and 0.6–0.8 SCF range and slight under-prediction in the 0.2–0.4 range. For each supervised pixel, the correction activates only when the prediction deviates in the empirically observed direction of error. In all three bins, the penalty is computed as the squared one-sided residual, averaged over the bin. To reflect the differing severity of the observed biases, the bin contributions are scaled proportionally to the bias magnitude. The overall contribution of the bias term to the total loss is controlled by λ_bias = 3.0, selected based on a sweep over λ_bias ∈ {1.0, 2.0, 3.0, 4.0} that balanced bias reduction in the intermediate SCF bins against degradation in the near-unity SCF. The model was trained using the AdamW optimizer [39] with a learning rate of 3 × 10⁻⁴ and weight decay of 10⁻². A ReduceLROnPlateau scheduler reduced the learning rate by a factor of 0.5 after 4 epochs without improvement in validation loss, with a minimum learning rate of 10⁻⁶. Mixed-precision training (AMP) was used to reduce memory requirements and accelerate computation [40]. Gradients were clipped to a maximum norm of 5.0 to prevent gradient explosion. The model was trained with a batch size of 8, iterating over the full training dataset once per epoch comprising 46 batches (17,252,774 supervised pixels from the training year 2012). The best model checkpoint was obtained at epoch 55, with early stopping triggered after 15 consecutive epochs without a minimum improvement of 10⁻⁴ in validation loss, resulting in 70 total training epochs. Fixed random seeds were used for both training and inference to ensure reproducibility. The training and validation loss curves alongside the learning rate schedule are provided in Figure S1. Channel-wise feature importance was estimated post-training using the Integrated Gradients method [41]. For each input channel, attributions were computed over 16 randomly selected validation batches with 64 interpolation steps, and the mean absolute attribution per channel was reported.

Spatial SCF reconstruction was performed over the full 2000–2014 period. At inference time, the synthetic gap mask seed was set to a value different from that used during training to prevent the model from exploiting any memorized gap patterns. Uncertainty estimates were derived using Monte Carlo (MC) Dropout [42], in which the Dropout2d layers were kept active during inference rather than deactivated as in standard evaluation. Thirty stochastic forward passes were performed for each timestep, and the posterior mean and standard deviation of the predicted SCF probabilities were computed using Welford’s online algorithm [43]. The final reconstructed SCF product was generated by filling gaps in the observed SCF field with the model’s posterior mean prediction, while retaining valid observations where available. Training was completed in approximately 25 min on a single NVIDIA RTX 4090 GPU (NVIDIA Corporation, Santa Clara, CA, USA), and full inference over the 15-year period (2000–2014), including 30 MC Dropout forward passes per timestep for uncertainty estimation required approximately 40 min on two NVIDIA RTX 4090 GPUs. All model training and inference were carried out on the UBELIX high-performance computing (HPC) cluster of the University of Bern.

3. Results

3.1. Model Performance and Baseline Comparison

The overall predictive performance of the U-Net model was evaluated against the withheld supervised pixels across the full 15-year inference period (2000–2014), excluding the summer months (June, July, and August) due to the absence of snow cover. The model achieved an R² of 0.9342 and an RMSE of 0.1127, indicating strong overall agreement between predicted and observed SCF across the full dynamic range. The full performance metrics and the hexbin plot of observed versus predicted SCF are shown in Figure 4.

To contextualize the U-Net performance, four baseline methods were evaluated on the same supervised pixel set: a spatial interpolation baseline (inverse distance weighting), a SWE-based physical baseline (sigmoidal transfer function), and two machine learning baselines trained on the same predictor set, namely XGBoost [44] and Random Forest [45]. The results are summarized in Table 2. Spatial interpolation performed poorest across all metrics (R² = 0.3409), reflecting the limited capacity of distance-weighted interpolation to capture the spatial complexity of snow cover. The SWE sigmoidal baseline improved substantially over spatial interpolation, while XGBoost and Random Forest performed comparably and together represented a strong machine learning reference. The U-Net achieved the best performance across all four metrics (RMSE = 0.1127, R² = 0.9342), though the margin over the tree-based models was modest. Notably, all model-based approaches exhibited near-zero overall bias, though this aggregate measure conceals systematic differences across SCF bins, which are examined in the following section.

3.2. Error Analysis

To examine where prediction errors are concentrated across the SCF distribution, RMSE, MAE, and bias were computed separately for five SCF bins of width 0.2 (Figure 5). Errors are lowest in the near-zero bin (RMSE = 0.0781), which dominates the sample with over 47 million observations (Figure 5d), and in the fully snow-covered bin (RMSE = 0.1212). The highest errors occur in the intermediate bins, peaking at RMSE = 0.2331 in the 0.4–0.6 range. This pattern is consistent with the physical complexity of partial snow cover states, where sub-pixel heterogeneity and the sensitivity of SCF to small changes in surface conditions make prediction inherently more difficult [46], and it is further compounded by the higher retrieval uncertainty of the ESA CCI SCFV AVHRR product itself in the partially snow-covered range relative to fully snow-covered conditions [11].

The bias structure reveals a clear systematic pattern (Figure 5c): the model over-predicts most strongly in the 0.6–0.8 transition zone (+0.0874) and slightly in the low SCF bin (+0.0144), while under-predicting in the 0.2–0.4 bin (−0.0548) and the fully snow-covered bin (−0.0291). The empirical bias correction term incorporated in the loss function (Section 2.3) partially mitigated both the over-prediction in the 0.6–0.8 bin and the under-prediction in the 0.2–0.4 bin, though residual biases in both ranges remain.

Monthly RMSE and MAE distributions over all years and spatial pixels are shown in Figure 6 for the snow-relevant months. Performance varies substantially across the annual snow cycle. RMSE is lowest in September, reflecting the low SCF during the transition from snow-free to early accumulation conditions, when errors are inherently small in absolute-error metrics. RMSE increases progressively through the accumulation season and reaches its seasonal maximum in December as snow extent and spatial variability increase. Stratified analysis by daily 2 m temperature and SWE conditions (Figure S2) indicates that errors peak during shallow snowpack conditions (0 < SWE ≤ 50 mm) and cold temperatures, consistent with the spatially heterogeneous snow cover characteristic of the snow accumulation season. It should be noted that elevated errors during this period may also partly reflect known accuracy limitations in the underlying ESA CCI AVHRR SCF product, which exhibits increased misclassification during early snow season when snow cover is patchy [11]. A secondary improvement is visible through the spring melt period, with RMSE declining from January levels to a local minimum in May. The same pattern is evident in the monthly MAE distributions.

The spatial distribution of mean predicted SCF closely reproduces the observed mean SCF pattern across Scandinavia (Figure 7a,b), capturing the high snow persistence over the Scandinavian mountain range along the Norwegian–Swedish border, the gradual decrease towards the Fennoscandian lowlands, and the reduced snow cover of southern Sweden and Denmark. The model correctly reproduces the strong altitudinal and latitudinal gradients in mean SCF.

The mean bias map (Figure 7c) reveals a predominantly small positive bias across most of the domain, with localized negative bias areas. The spatial pattern of RMSE in mountainous terrain (Figure 7d) is closely linked to the absence of SWE data in these regions. The passive microwave-based SWE product (ESA CCI SWE SSMIS DMSP v3.1) provides no retrievals over the high-altitude areas of the Scandinavian mountain range, as signal saturation, emission from wet snow, and the coarse sensor footprint render retrievals unfeasible in rugged terrain [29]. As a result, the model receives no SWE signal in exactly the areas with the highest snow cover, depriving it of its most important predictor and leading to elevated estimation errors. Stratification of evaluation pixels by SWE availability confirms this constraint quantitatively: pixels with valid SWE achieve RMSE = 0.0868 and MAE = 0.0355, compared to RMSE = 0.2135 and MAE = 0.1000 for pixels where SWE is missing, representing a substantial increase in reconstruction error. The uncertainty estimates from MC Dropout similarly peak in these mountain regions (Section 3.3), confirming that the model depicts the reduced confidence where SWE information is absent. In lowland and boreal areas, bias is small and spatially coherent, indicating reliable reconstruction performance.

3.3. Reconstruction Examples and Prediction Uncertainty

Representative examples of the gap-filling reconstruction are shown for four dates spanning different phases of the snow season. Each panel triplet shows the original observed SCF field with data gaps caused by cloud cover or polar night (left panel), the U-Net prediction reconstructed over the full spatial domain (center panel), and the final gap-filled product in which original ESA CCI L3C SCFV AVHRR v4.0 observations are retained where valid and model predictions are inserted exclusively where observations are missing (right panel). Figure 8a shows a mid-winter example and illustrates the model’s ability to reconstruct spatially coherent snow cover under near-total cloud cover. Despite the absence of optical observations over large parts of the domain, the model produces a physically plausible field that respects the altitudinal gradient of the Scandinavian mountains and the latitudinal snow extent, drawing primarily on SWE, temperature, and lag features. The spring melt example (Figure 8b) demonstrates the model successfully capturing the broad spatial pattern of retreating snow cover, with SCF declining from the mountain ridge towards the lowlands. The autumn onset example (Figure 8c) shows the model’s behavior during early-season snow accumulation, where the first snowfall events create heterogeneous fields that the model captures at a broad scale, though fine-scale patchiness is less well resolved. The fourth example (Figure 8d) illustrates a challenging reconstruction case. The available observations reveal a highly fragmented snow cover pattern over Sweden and Finland, with abrupt transitions between snow-covered and snow-free pixels at relatively low elevations. The model prediction is spatially smoother than the true field, failing to capture the fine-scale, sharp and fragmented transitions of rapidly changing SCF. This case is representative of the conditions that drive the elevated RMSE and MAE during snow onset periods: when the snowpack is thin and spatially discontinuous, the coarse-resolution SWE and temperature predictors lack the spatial detail needed to resolve sharp snow boundaries, and the spatially smoothing nature of the convolutional architecture further suppresses abrupt transitions, producing fields that underestimate the degree of sub-pixel patchiness.

Beyond individual reconstruction dates, the temporal evolution of snow-covered area across the full inference period is illustrated in Figure 9. While the observed SCF record is temporally continuous, persistent cloud cover and polar night mask large portions of the domain on any given day, resulting in a systematically lower domain-averaged snow-covered area. The reconstructed time series recovers the full spatial extent of snow cover by filling these obscured pixels, thereby capturing the true amplitude of annual accumulation and melt cycles throughout all snow seasons.

Spatiotemporal uncertainty was quantified using Monte Carlo Dropout with 30 stochastic forward passes, yielding a posterior standard deviation for each pixel and timestep. The time-mean MC uncertainty map (Figure 10a) reveals a clear spatial structure: uncertainty is lowest over the Fennoscandian lowlands and southern Scandinavia, and highest along the Scandinavian mountain range. This elevated uncertainty in mountain regions is consistent with the SWE data limitations discussed in Section 3.2. The seasonal cycle of spatially averaged uncertainty (Figure 10b) shows elevated MC std in November and December. This pattern is consistent with the monthly averaged RMSE seen in Section 3.2.

3.4. Feature Importance and In Situ Validation

SWE and 2 m air temperature with their respective lag features and forest land cover emerge as the prevalent predictors, confirming that short-term temporal dynamics of the snowpack carry substantial predictive value beyond the instantaneous state (Figure 11). Elevation ranks sixth (0.382), reflecting the strong altitudinal control on snow persistence. Among the land cover classes, forest contributes most strongly, likely because canopy interception and shading exert a dominant control on sub-canopy SCF [47]. It should be noted that these predictors are not fully independent: temperature decreases with elevation following the atmospheric lapse rate, and SWE integrates the cumulative effect of temperature and precipitation over the snowpack. Furthermore, SWE, temperature, their respective lag features, and the melt proxy form a group of physically intercorrelated predictors, and the within-group attribution ranking is sensitive to training data quality rather than necessarily reflecting their relative independent physical importance [48]. The apparent hierarchy in feature importance should therefore be interpreted as reflecting the relative direct contribution of each channel to the model’s predictions rather than the independent physical importance of each variable. Flooded and sparse vegetation contribute at an intermediate level, while shrubland, water bodies, cropland, urban areas, and permanent snow/ice all have near-negligible importance. The melt proxy contributes modestly (0.135), suggesting the derived interaction term provides some additional signal beyond what SWE and temperature independently encode, particularly during melt onset. Notably, the current-day SCF observation and its one-day lag rank as the two least important predictors overall. This is a direct consequence of the training design: with 91.5% of SCF pixels withheld as supervised targets, the SCF input channel is masked for the vast majority of pixels in each training sample, effectively reducing its contribution to near-zero. This confirms that the model has learned to reconstruct SCF primarily from physically informative non-optical predictors rather than relying on the observed SCF input.

To evaluate the gap-filled SCF against independent ground observations, each station was spatially assigned to its corresponding 0.05° SCF grid pixel. Snow presence at each station was defined using a snow depth threshold of 2.0 cm, and the corresponding SCF pixel was classified as snow-covered if the SCF value exceeded 0.2. However, it should be noted that comparing point-based ground observations to areal grid pixel values introduces a scale mismatch, as a single station measurement may not be representative of the spatial variability within a 0.05° pixel. Two mutually exclusive evaluation sets were then defined. An observed pair exists when a valid station snow depth measurement and a valid ESA CCI SCFV AVHRR observation are both available for the same pixel and day; these pairs evaluate the baseline satellite product against ground truth. A gap-filled pair exists when a valid station snow depth measurement is available, but no valid ESA CCI observation exists for that pixel and day, such that only the U-Net prediction is available; these pairs evaluate model performance exclusively in real observational gaps. This distinction ensures that the gap-filled product is assessed under genuinely cloud-affected or polar-night conditions rather than on clear-sky pixels already covered by the satellite record. The confusion matrices for both evaluation sets are shown in Figure 12. The gap-filled product achieves an overall accuracy of 86.7% (n = 64,872), compared to 87.8% for the satellite observation baseline (n = 28,486), with identical F1 scores of 88.0% for both (F1 = 2 × (precision × recall)/(precision + recall)), indicating comparable overall classification skill across the two evaluation sets. A sensitivity analysis across snow depth thresholds of 1, 2, and 5 cm and SCF thresholds of 0.1, 0.2, and 0.5 confirms that the gap-filled product consistently achieves accuracy and F1 scores comparable to the ESA CCI L3C SCFV AVHRR v4.0 product (Table S1).

The inter-annual variability of ΔAccuracy and its monthly breakdown are shown in Figure 13. Across most years, the median ΔAccuracy is close to zero, indicating that the gap-filled predictions match the accuracy of the original satellite observations when evaluated against station data. Years with notably negative median ΔAccuracy indicate years where the model’s predictions in the gaps were less accurate than the available satellite observations. These negative years do not show a systematic temporal trend, suggesting they reflect episodic anomalous snow conditions rather than a systematic model degradation [49]. Several years show positive medians, where the gap-filled product outperforms the satellite baseline at station locations.

The seasonal breakdown of ΔAccuracy (Figure 13b) reveals that the largest positive differences occur in winter months, suggesting that the U-Net gap-filling may provide accuracy improvements over available satellite observations during deep winter when cloud cover is most persistent. However, this result should be interpreted with caution, as days without valid satellite observations and days with valid observations represent systematically different sampling conditions, and the observed differences may partly reflect contrasting snow cover characteristics between cloudy and clear-sky days rather than genuine reconstruction skill.

4. Discussion

The results confirm that physically meaningful auxiliary predictors, in particular SWE and near-surface temperature, provide sufficient signal to reconstruct SCF in the complete absence of concurrent optical observations, enabling gap-filling under both cloud cover and polar night conditions. Zhang et al. [15], in a recent comprehensive review of reconstruction approaches for polar-orbiting snow cover products, identify the incorporation of spatiotemporal environmental factors, including terrain, temperature, and forest cover, and the further development of machine learning-based reconstruction as the two most important directions for advancing the field. The present framework directly addresses both: it integrates elevation, land cover, reanalysis temperature, and passive microwave SWE within a deep learning architecture and demonstrates that this combination is sufficient to reconstruct SCF even in the complete absence of optical observations. The U-Net outperforms the pixel-wise XGBoost and Random Forest baselines across all metrics, with a particularly notable reduction in MAE (24.7% relative to XGBoost). However, as the baseline models do not incorporate spatial context, the performance difference cannot be unambiguously attributed to the spatial convolutional architecture alone, as it also reflects differences in model capacity and feature representation between deep learning and tree-based approaches. Xiao et al. [11] proposed a spatiotemporal interpolation approach to address data gaps within the ESA CCI SCFV AVHRR dataset, which, however, fails under multi-day cloud cover or polar night. The U-Net framework proposed here is explicitly designed for these conditions and is therefore better suited to the persistent observational gaps characteristic of high-latitude environments.

Performance is notably weaker in the intermediate SCF range (0.2–0.8), reflecting the inherent difficulty of partial snow cover estimation at sub-pixel scales. This is compounded by the fact that the ESA CCI SCFV AVHRR product used as both training target and evaluation reference carries reported RMSEs of 16–19% against high-resolution reference maps [11], meaning that errors in the intermediate range partly reflect uncertainty in the reference data itself rather than model deficiencies alone. A related challenge arises during complex snow onset and melt conditions, when patchy transitional snow cover produces highly heterogeneous distributions that are difficult to capture from coarse-resolution auxiliary predictors. During these periods, small errors in SWE or temperature translate into large SCF errors, and the inherent spatial smoothing of the convolutional architecture further suppresses sharp snow boundaries, producing fields that underestimate patchiness and misplace the transition zone between snow-covered and snow-free areas.

The limited availability of SWE data over the Scandinavian mountain range, where the passive microwave sensor provides no valid data due to signal saturation, wet snow emission, and coarse spatial resolution [29], represents a major structural constraint of the current framework. This deprives the model of its single most important predictor in precisely the areas with the highest and most variable snow cover, producing the elevated RMSE and positive bias concentrated along the Scandinavian mountain range visible in Figure 7c,d. The MC Dropout uncertainty estimates corroborate this, peaking in the same regions and confirming that the model is appropriately less confident where SWE data are absent. Zhang et al. [15] similarly highlight complex mountainous terrain as one of the most persistent challenges for snow cover reconstruction, noting that observation blind spots caused by satellite viewing angles and mountain shadows significantly increase uncertainty at the local scale. Mudryk et al. [50] demonstrate through a comprehensive benchmarking of 23 gridded SWE products that all evaluated products show substantially lower skill in mountainous terrain compared to non-mountainous regions. Among the benchmarked products, ERA5-Land [30] and Crocus-ERA5 [51] display the greatest skill in mountainous terrain and could serve as alternative SWE inputs in future implementations of this framework. Physically based snowpack models forced by high-resolution meteorological data, such as SNOWPACK [52] or Alpine3D [53], represent additional promising alternatives for generating spatially detailed SWE estimates in complex terrain. Emerging approaches based on synthetic aperture radar (SAR) data also show promise for SWE retrieval in mountainous regions, though operational global products are not yet widely available.

The Integrated Gradients feature importance analysis reveals a physically interpretable predictor hierarchy consistent with known controls on snow distribution. SWE and 2 m air temperature jointly lead the predictor hierarchy, reflecting their complementary roles in governing snow accumulation and melt processes, followed by forest land cover and their respective one-day lag features. The strong contribution of forest land cover is consistent with canopy effects on sub-canopy SCF through interception and shading [47]. Notably, the observed SCF input channel contributes negligibly, demonstrating that the model reconstructs snow cover almost exclusively from non-optical predictors, precisely the property required for generalization to polar night conditions. The within-group attribution ranking between these physically intercorrelated predictors should be interpreted cautiously, as credit distribution among correlated inputs is sensitive to training data quality rather than reflecting a definitive statement on their relative independent importance [48]. It should further be noted that the SWE-based quality filtering applied during preprocessing may partially reinforce the SWE-SCF relationship in the training data, meaning that the dominant attribution assigned to SWE relative to temperature should be interpreted with caution and may not reflect the true independent physical importance of each predictor.

The model was trained on a single year (2012) and validated on one adjacent year (2013), which raises questions about the representativeness of the training sample. However, inference over the full 15-year period (2000–2014) demonstrates that the learned relationships generalize across a wide range of snow seasons and interannual variability without retraining, suggesting that the physically grounded predictor set confers a degree of robustness beyond what the limited training window might imply.

5. Conclusions and Future Work

This study demonstrated that a U-Net–based deep learning framework can effectively reconstruct SCF under cloud cover and polar night conditions in Scandinavia, using physically meaningful auxiliary predictors in the complete absence of concurrent optical observations. Trained on ESA CCI L3C SCFV AVHRR v4.0 data together with SWE, near-surface air temperature, elevation, and land cover, the model achieves an R² of 0.9342 and RMSE of 0.1127 across the full 15-year inference period, representing an improvement over spatial interpolation, SWE-based physical baseline, and pixel-wise machine learning baselines, though the margin over tree-based approaches is modest and the primary added value of the framework lies in producing spatially coherent, physically informed SCF reconstructions under persistent cloud cover and polar night conditions rather than in substantially outperforming pixel-wise approaches in standard accuracy metrics.

A particular strength of the framework is its ability to reconstruct SCF almost entirely from non-optical predictors. Feature importance analysis using Integrated Gradients shows that SWE and near-surface temperature together account for the dominant share of predictive signal, with the observed SCF input contributing negligibly. This means the model remains fully operational during polar night and under persistent cloud cover, precisely the conditions where conventional optical retrievals fail entirely. Independent validation against ground station observations from Norway, Finland, and Sweden yields an overall binary classification accuracy of 86.7% and an F1 score of 88.0%, only marginally below the 87.8% accuracy of the available satellite observations at the same station locations, confirming that the gap-filled product is of comparable binary classification quality to the original ESA CCI retrievals.

A natural direction for future work is the extension of this methodology to the full ESA CCI SCFV AVHRR record from 1979 to 2023 and potentially to the global land surface, contingent on explicit validation across regions with different snow regimes, land cover compositions, and SWE data availability. The Scandinavian domain served as a well-constrained testbed, but the framework is in principle transferable to any region affected by persistent cloud cover or polar night. However, the model was trained exclusively on Scandinavian snow conditions, and transferability to other regions of the Northern Hemisphere with different snow regimes, land cover compositions, and SWE data availability would need to be explicitly tested before a global application can be attempted. Such an extension would produce a spatially and temporally continuous long-term SCF record of unique value for climate trend analysis, hydrological modeling, and cryospheric monitoring at the global scale, substantially expanding the scientific utility of the existing ESA CCI AVHRR SCF product.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs18122030/s1, Figure S1: (a) Training loss (blue) and validation loss (orange) over the model training epochs. The orange dot marks the best validation loss (epoch 55). (b) Learning rate schedule over training. Grey bands in-dicate epochs at which reductions occurred; Figure S2: Performance metrics of the U-Net evaluated pixel set stratified by daily 2m air temperature (panels a–c) and SWE (panels d–f) conditions over the full inference period (2000–2014). RMSE (a,d), MAE (b,e), and bias (c,f) are computed across all evaluation pixels whose co-located predictor value falls within the respective bin range Temperature bins are defined as T < −10 °C, −10 °C ≤ T < 0 °C, 0 °C ≤ T < 5 °C, and T ≥ 5 °C. SWE bins are defined as SWE = 0, 0 < SWE ≤ 50 mm, 50 < SWE ≤ 150 mm, and SWE > 150 mm; Table S1: Sensitivity analysis of binary classification metrics (overall accuracy and F1 score) for the gap-filled product (pred.) and satellite observation baseline (obs.) across nine combinations of snow depth threshold (1, 2, and 5 cm) and SCF threshold (0.1, 0.2, and 0.5). The originally reported combination (snow depth ≥ 2 cm, SCF ≥ 0.2) is highlighted in bold.

Author Contributions

Conceptualization, F.J. and S.W.; methodology, F.J.; software, F.J. and C.N.; validation, F.J.; formal analysis, F.J.; investigation, F.J.; resources, F.J., S.W. and C.N.; data curation, F.J.; writing—original draft preparation, F.J.; writing—review and editing, F.J. and S.W.; visualization, F.J.; supervision, S.W.; project administration, S.W.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been supported by the European Space Agency (ESA) Snow Climate Change Initiative (CCI+) project (4000124098/18/I-NB-SNOW_CCI).

Data Availability Statement

The dataset and code in this study will be openly shared upon publication at https://github.com/fabiojakob/u-net-for-snow-cover-fraction-gap-filling (accessed on 16 June 2026).

Acknowledgments

UniBern HPC UBELIX; MET Norway, FMI and SMHI providing the in situ data; We acknowledge the ESA Snow CCI+ Project lead by Thomas Nagler (ENVEO IT GmbH) and the project management by Gabriele Schwaizer (ENVEO IT GmbH).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dong, C. Remote sensing, hydrological modeling and in situ observations in snow cover research: A review. J. Hydrol. 2018, 561, 573–583. [Google Scholar] [CrossRef]
Kouki, K.; Luojus, K.; Riihela, A. Evaluation of snow cover properties in ERA5 and ERA5-Land with several satellite-based datasets in the Northern Hemisphere in spring 1982–2018. Cryosphere 2023, 17, 5007–5026. [Google Scholar] [CrossRef]
Daloz, A.; Schwingshackl, C.; Mooney, P.; Strada, S.; Rechid, D.; Davin, E.; Katragkou, E.; De Noblet-Ducoudré, M.; Halenka, T.; Breil, M.; et al. Land-atmosphere interactions in sub-polar and alpine climates in the CORDEX flagship pilot study Land Use and Climate Across Scales (LUCAS) models-Part 1: Evaluation of the snow-albedo effect. Cryosphere 2022, 16, 2403–2419. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, S.; Barr, A.; Black, T. Impact of snow cover on soil temperature and its simulation in a boreal aspen forest. Cold Reg. Sci. Technol. 2008, 52, 355–370. [Google Scholar] [CrossRef]
Rixen, C.; Høye, T.; Macek, P.; Aerts, R.; Alatalo, J.; Anderson, J.; Arnold, P.; Barrio, I.; Bjerke, J.; Björkmann, M.; et al. Winters are changing: Snow effects on Arctic and alpine tundra ecosystems. Arct. Sci. 2022, 8, 572–608. [Google Scholar] [CrossRef]
Irannezhad, M.; Ronkanen, A.K.; Malekian, A. Editorial: Climate impacts on snowpack dynamics. Front. Earth Sci. 2022, 10, 970981. [Google Scholar] [CrossRef]
Yan, W.; Wang, Y.; Ma, X.; Liu, M.; Yan, J.; Tan, Y.; Liu, S. Snow Cover and Climate Change and Their Coupling Effects on Runoff in the Keriya River Basin during 2001–2020. Remote Sens. 2023, 15, 3435. [Google Scholar] [CrossRef]
Berezowski, T.; Nossent, J.; Chormański, J.; Batelaan, O. Spatial sensitivity analysis of snow cover data in a distributed rainfall-runoff model. Hydrol. Earth Syst. Sci. 2015, 19, 1887–1904. [Google Scholar] [CrossRef]
Awasthi, S.; Varade, D. Recent advances in the remote sensing of alpine snow: A review. GISci. Remote Sens. 2021, 58, 552–588. [Google Scholar] [CrossRef]
Warren, S.G. Optical properties of snow. Rev. Geophys. 1982, 20, 67–89. [Google Scholar] [CrossRef]
Xiao, X.; Naegeli, K.; Premier, V.; Li, S.; Neuhaus, C.; Wiesmann, A.; Wunderle, S. Introduction to a 45-year (1979–2023) global daily snow cover fraction product from multiple AVHRR satellites with accuracy assessment. Remote Sens. Environ. 2026, 334, 115235. [Google Scholar] [CrossRef]
Hall, D.K.; Riggs, G.A. Accuracy assessment of the MODIS snow products. Hydrol. Process. 2007, 21, 1534–1547. [Google Scholar] [CrossRef]
Dietz, A.J.; Wohner, C.; Kuenzer, C. European Snow Cover Characteristics between 2000 and 2011 Derived from Improved MODIS Daily Snow Cover Products. Remote Sens. 2012, 4, 2432–2454. [Google Scholar] [CrossRef]
Huang, Y.; Song, Z.; Yang, H.; Yu, B.; Liu, H.; Che, T.; Chen, J.; Wu, J.; Shu, S.; Peng, X.; et al. Snow cover detection in mid-latitude mountainous and polar regions using nighttime light data. Remote Sens. Environ. 2022, 268, 112766. [Google Scholar] [CrossRef]
Zhang, J.; Zeng, X.; Wan, J.; Liu, J.; Xia, Z. Advances and prospects in reconstruction approaches for snow cover mapping using polar-orbiting satellites. Front. Earth Sci. 2025, 13, 1649808. [Google Scholar] [CrossRef]
Hall, D.K.; Riggs, G.A.; Foster, J.L.; Kumar, S.V. Development and evaluation of a cloud-gap-filled MODIS daily snow-cover product. Remote Sens. Environ. 2010, 114, 496–503. [Google Scholar] [CrossRef]
Xiao, X.; He, T.; Liang, S.; Liang, S.; Liu, X.; Ma, Y.; Wan, J. Towards a gapless 1 km fractional snow cover via a data fusion framework. ISPRS J. Photogramm. Remote Sens. 2024, 215, 419–441. [Google Scholar] [CrossRef]
Xiao, X.; Liang, S.; He, T.; Wu, D.; Pei, C.; Gong, J. Estimating fractional snow cover from passive microwave brightness temperature data using MODIS snow cover product over North America. Cryosphere 2021, 15, 835–861. [Google Scholar] [CrossRef]
Magnusson, J.; Wever, N.; Essery, R.; Helbig, N.; Winstral, A.; Jonas, T. Evaluating snow models with varying process representations for hydrological applications. Water Resour. Res. 2015, 51, 2707–2723. [Google Scholar] [CrossRef]
Ruelland, D. Potential of snow data to improve the consistency and robustness of a semi-distributed hydrological model using the SAFRAN input dataset. J. Hydrol. 2024, 631, 130820. [Google Scholar] [CrossRef]
Yatheendradas, S.; Kumar, S. A Novel Machine Learning–Based Gap-Filling of Fine-Resolution Remotely Sensed Snow Cover Fraction Data by Combining Downscaling and Regression. J. Hydrometeorol. 2022, 23, 637–658. [Google Scholar] [CrossRef]
Hou, J.; Huang, C.; Zhang, Y.; Guo, J.; Gu, J. Gap-Filling of MODIS Fractional Snow Cover Products via Non-Local Spatio-Temporal Filtering Based on Machine Learning Techniques. Remote Sens. 2019, 11, 90. [Google Scholar] [CrossRef]
Liu, C.; Huang, X.; Li, X.; Liang, T. MODIS Fractional Snow Cover Mapping Using Machine Learning Technology in a Mountainous Area. Remote Sens. 2020, 12, 962. [Google Scholar] [CrossRef]
Xing, D.; Hou, J.; Huang, C.; Zhang, W. Spatiotemporal Reconstruction of MODIS Normalized Difference Snow Index Products Using U-Net with Partial Convolutions. Remote Sens. 2022, 14, 1795. [Google Scholar] [CrossRef]
Niu, G.-Y.; Yang, Z.-L. An observation-based formulation of snow cover fraction and its evaluation over large North American river basins. J. Geophys. Res. 2007, 112, D21101. [Google Scholar] [CrossRef]
Chu, D.; Liu, L.; Wang, Z. Snow Cover on the Tibetan Plateau and Topographic Controls. Remote Sens. 2023, 15, 4044. [Google Scholar] [CrossRef]
Qin, S.; Xiao, P.; Zhang, X. How do snow cover fraction change and respond to climate in Altai Mountains of China? Int. J. Climatol. 2022, 42, 7213–7227. [Google Scholar] [CrossRef]
Poussin, C.; Timoner, P.; Chatenoux, B.; Giuliani, G.; Peduzzi, P. Improved Landsat-based snow cover mapping accuracy using a spatiotemporal NDSI and generalized linear mixed model. Sci. Remote Sens. 2023, 7, 100078. [Google Scholar] [CrossRef]
ESA CCI Snow. Available online: https://climate.esa.int/en/projects/snow/ (accessed on 27 March 2026).
Muñoz-Sabater, J.; Dutra, E.; Augustí-Panareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
ESA CCI Land Cover. Available online: https://climate.esa.int/en/projects/land-cover/ (accessed on 27 March 2026).
Jarvis, A.; Reuter, H.I.; Nelson, A.; Guevara, E. Hole-Filled Seamless SRTM Data V4; International Centre for Tropical Agriculture (CIAT): Cali, Colombia, 2008; Available online: https://srtm.csi.cgiar.org (accessed on 27 March 2026).
Norwegian Meteorological Institute. Available online: https://seklima.met.no/observations/ (accessed on 7 April 2026).
Finnish Meteorological Institute. Available online: https://en.ilmatieteenlaitos.fi/download-observations (accessed on 7 April 2026).
Swedish Meteorological and Hydrological Institute. Available online: https://www.smhi.se/data/nederbord-och-fuktighet/sno/snowDepth (accessed on 7 April 2026).
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Wu, Y.; He, K. Group Normalization. Int. J. Comput. Vis. 2020, 128, 742–755. [Google Scholar] [CrossRef]
Huber, P.J. Robust Estimation of a Location Parameter. In Breakthroughs in Statistics; Kotz, S., Johnson, N.L., Eds.; Springer: New York, NY, USA, 1992; pp. 492–518. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Micikevicius, P.; Narang, S.; Alben, J.; Diamos, G.F.; Elsen, E.; García, D.; Ginsburg, B.; Houston, M.; Kuchaiev, O.; Venkatesh, G.; et al. Mixed Precision Training. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
Welford, B.P. Note on a Method for Calculating Corrected Sums of Squares and Products. Technometrics 1962, 3, 419–420. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rittger, K.; Thomas, H.P.; Dozier, D. Assessment of methods for mapping snow cover from MODIS. Adv. Water Resour. 2013, 51, 267–380. [Google Scholar] [CrossRef]
Varhola, A.; Coops, N.C.; Weiler, M.; Moore, R.D. Forest canopy effects on snow accumulation and ablation: An integrative review of empirical results. J. Hydrol. 2010, 392, 219–233. [Google Scholar] [CrossRef]
Krell, E.; Mamalakis, A.; King, S.; Tissot, P.; Ebert-Uphoff, I. The influence of correlated features on neural network attribution methods in geoscience. Environ. Data Sci. 2025, 4, e29. [Google Scholar] [CrossRef]
Räisänen, J. Snow conditions in northern Europe: The dynamics of interannual variability versus projected long-term change. Cryosphere 2021, 15, 1677–1696. [Google Scholar] [CrossRef]
Mudryk, L.; Mortimer, C.; Derksen, C.; Elias Chereque, A.; Kushner, P. Benchmarking of snow water equivalent (SWE) products based on outcomes of the SnowPEx+ Intercomparison Project. Cryosphere 2025, 19, 201–218. [Google Scholar] [CrossRef]
Buarque, S.; Decharme, B.; Barbu, A.; Franchisteguy, L. Insights into the North Hemisphere daily snowpack at high resolution from the new Crocus–ERA5 product. Earth Syst. Sci. Data 2025, 17, 7227–7249. [Google Scholar] [CrossRef]
Schmucki, E.; Marty, C.; Fierz, C.; Lehning, M. Evaluation of modelled snow depth and snow water equivalent at three contrasting sites in Switzerland using SNOWPACK simulations driven by different meteorological data input. Cold Reg. Sci. Technol. 2014, 99, 27–37. [Google Scholar] [CrossRef]
Lehning, M.; Völksch, I.; Gustafsson, D.; Nguyen, T.A.; Stähli, M.; Zappa, M. ALPINE3D: A detailed model of mountain surface processes and its application to snow hydrology. Hydrol. Process. 2006, 20, 2111–2128. [Google Scholar] [CrossRef]

Figure 1. Input predictor data visualizations, for example, date 25 December 2012. (a) Snow Cover Fraction (SCF; ESA CCI SCFV AVHRR v4.0) with cloud, polar night and water body pixels masked; (b) Snow Water Equivalent (SWE; ESA CCI SWE SSMIS DMSP v3.1), with high-altitude and water body pixels masked due to passive microwave retrieval limitations; (c) 2 m air temperature (ERA5-Land); (d) Digital Elevation Model (CGIAR-CSI SRTM 4.1); (e) remapped land cover classes (ESA CCI LC); (f) Melt proxy derived from SWE and 2 m temperature, with high-altitude and water body pixels masked. All fields are shown at 0.05° resolution after resampling.

Figure 2. Spatial distribution of ground stations used for in situ validation, provided by MET Norway (Norway, pink), FMI (Finland, blue), and SMHI (Sweden, green). A total of 41 stations with 93,358 observations are included. Marker size is proportional to the number of daily snow depth observations available per station over the inference period (2000–2014), ranging from 136 to 5071 observations.

Figure 3. Overview of the proposed methodology. The processing pipeline proceeds from data harmonization through preprocessing and input tensor construction to masked U-Net training and inference. (a) Input tensor stack comprising 16 channels: current-day SCF, SWE, 2 m temperature, elevation, land cover classes, and melt proxy, plus one-day lag features for SWE, 2 m temperature, and SCF. (b) U-Net encoder–decoder with four encoding levels (64, 128, 256, 512 channels), a 1024-channel bottleneck, skip connections, bilinear upsampling, and a final 1 × 1 convolution.

Figure 4. Hexbin scatter plot of observed versus predicted Snow Cover Fraction (SCF) across all supervised evaluation pixels over the full inference period (2000–2014, excluding June–August).

Figure 5. Model performance stratified by SCF range. (a) RMSE, (b) MAE, (c) mean bias, and (d) number of evaluation samples per SCF bin.

Figure 6. Monthly distributions of RMSE (a) and MAE (b) over all evaluation pixels and years (2000–2014), for September–May.

Figure 7. Spatial performance maps over the full inference period (2000–2014). (a) Time-mean observed SCF; (b) time-mean predicted SCF; (c) mean bias (predicted − observed); (d) pixel-wise RMSE. Elevated RMSE along the Scandinavian mountain range is associated with SWE data gaps in high-altitude terrain.

Figure 8. Gap-filling reconstruction examples for (a) a mid-winter date (20 January 2003), (b) a spring date (23 April 2009), (c) an autumn date (23 October 2012) and (d) a complex snow onset scene (15 December 2005). Each row shows: (left) observed SCF with missing pixels in gray; (center) U-Net SCF prediction over the full domain; (right) gap-filled SCF with observed SCF where available.

Figure 9. Domain-averaged snow-covered area (%) over the full inference period (2000–2014). The observed SCF (red) shows systematically lower domain-averaged values due to cloud-contaminated and polar-night pixels being excluded from the spatial average, reducing the effective domain coverage, while the gap-filled SCF product (blue) recovers the true snow-covered area by reconstructing missing pixels, revealing the full amplitude of annual accumulation and melt cycles.

Figure 10. MC Dropout uncertainty estimates (posterior standard deviation, MC std) from 30 stochastic forward passes. (a) Time-mean MC std map; elevated uncertainty concentrated along the Scandinavian mountain range. (b) Seasonal cycle of spatially averaged MC std where bars represent the domain-mean MC std and error bars indicate ±1 spatial standard deviation.

Figure 11. Channel-wise feature importance estimated by Integrated Gradients, normalized to the maximum channel (SWE = 1.0).

Figure 12. Confusion matrices for binary snow/no-snow classification validated against ground station snow depth measurements (MET Norway, FMI, SMHI; snow depth ≥ 2.0 cm; SCF ≥ 0.2). (a) Gap-filled SCF predictions where satellite observations were missing; (b) ESA CCI L3C SCFV AVHRR v4.0 observations where available.

Figure 13. ΔAccuracy (gap-filled minus satellite observation accuracy) evaluated against ground station snow depth measurements (MET Norway, FMI, SMHI; snow depth threshold: 2.0 cm; SCF threshold: 0.2). (a) Inter-annual variability of ΔAccuracy across 2000–2014. (b) Seasonal cycle of ΔAccuracy; positive values indicate the gap-filled product outperforms the satellite baseline. Summer months show ΔAccuracy ≈ 0.

Table 1. Predictor datasets with native spatial and temporal resolutions and temporal coverages.

Name	Dataset	Spatial Resol.	Temporal Resol.	Temporal Coverage
Snow Cover Fraction (SCF)	ESA CCI L3C SCFV AVHRR v4.0	0.05°	Daily	1979–2023
Snow Water Equivalent (SWE)	ESA CCI L3C SWE SSMIS DMSP v3.1	0.1°	Daily	1979–2022
Land Cover	ESA CCI LC L4 300 m P1Y	300 m	Yearly	1992–2022
2 m Temperature	ERA5-Land 2 m Temperature	0.1°	Hourly	1950–present
Elevation	CGIAR-CSI SRTM 4.1	0.05°	Static	Static

Table 2. Performance comparison of the U-Net model and baseline methods on the supervised evaluation pixel set (n = 76,903,080).

Approach	RMSE	MAE	R²	Bias
Spatial Interpolation	0.3567	0.2067	0.3409	−0.0282
SWE Sigmoid	0.1920	0.1184	0.8091	0.0224
XGBoost	0.1234	0.0588	0.9211	−0.0004
Random Forest	0.1240	0.0590	0.9204	−0.0001
U-Net (this study)	0.1127	0.0443	0.9342	0.0040

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jakob, F.; Neuhaus, C.; Wunderle, S. Applying U-Net for Estimating AVHRR-Based Snow Cover Fraction (ESA CCI+ Snow) During Cloud Cover and Polar Night in Scandinavia. Remote Sens. 2026, 18, 2030. https://doi.org/10.3390/rs18122030

AMA Style

Jakob F, Neuhaus C, Wunderle S. Applying U-Net for Estimating AVHRR-Based Snow Cover Fraction (ESA CCI+ Snow) During Cloud Cover and Polar Night in Scandinavia. Remote Sensing. 2026; 18(12):2030. https://doi.org/10.3390/rs18122030

Chicago/Turabian Style

Jakob, Fabio, Christoph Neuhaus, and Stefan Wunderle. 2026. "Applying U-Net for Estimating AVHRR-Based Snow Cover Fraction (ESA CCI+ Snow) During Cloud Cover and Polar Night in Scandinavia" Remote Sensing 18, no. 12: 2030. https://doi.org/10.3390/rs18122030

APA Style

Jakob, F., Neuhaus, C., & Wunderle, S. (2026). Applying U-Net for Estimating AVHRR-Based Snow Cover Fraction (ESA CCI+ Snow) During Cloud Cover and Polar Night in Scandinavia. Remote Sensing, 18(12), 2030. https://doi.org/10.3390/rs18122030

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applying U-Net for Estimating AVHRR-Based Snow Cover Fraction (ESA CCI+ Snow) During Cloud Cover and Polar Night in Scandinavia

Highlights

Abstract

1. Introduction

2. Data and Methods

2.1. Study Area and Data

2.2. Data Preprocessing

2.3. Model Architecture and Training

3. Results

3.1. Model Performance and Baseline Comparison

3.2. Error Analysis

3.3. Reconstruction Examples and Prediction Uncertainty

3.4. Feature Importance and In Situ Validation

4. Discussion

5. Conclusions and Future Work

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI