Next Article in Journal
High-Resolution Spatial Prediction of Daily Average PM2.5 Concentrations in Jiangxi Province via a Hybrid Model Integrating Random Forest and XGBoost
Next Article in Special Issue
Real-Time Production of High-Resolution, Gap-Free, 3-Hourly AOD over South Korea: A Machine Learning Approach Using Model Forecasts, Satellite Products, and Air Quality Data
Previous Article in Journal
Characteristics and Mechanisms of the Dipole Precipitation Pattern in “Westerlies Asia” over the Past Millennium Based on PMIP4 Simulation
Previous Article in Special Issue
Time Series Prediction and Modeling of Visibility Range with Artificial Neural Network and Hybrid Adaptive Neuro-Fuzzy Inference System
error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DSMF-Net: A Spatiotemporal Memory Flow Network for Long-Range Prediction of Stratospheric Sudden Warming Events

1
Suzhou Key Laboratory of Bio-Photonics, School of Optical and Electronic Information, Suzhou City University, Suzhou 215104, China
2
Sate Key Laboratory of Solar Activity and Space Weather, Chinese Academy of Sciences, Beijing 100190, China
3
Advanced Laser Technology Laboratory of Anhui Province, Hefei 230037, China
4
School of Mathmatics and Physics, Anqing Normal University, Anqing 246113, China
*
Author to whom correspondence should be addressed.
Atmosphere 2025, 16(12), 1316; https://doi.org/10.3390/atmos16121316
Submission received: 20 October 2025 / Revised: 18 November 2025 / Accepted: 19 November 2025 / Published: 21 November 2025
(This article belongs to the Special Issue Atmospheric Modeling with Artificial Intelligence Technologies)

Abstract

Sudden Stratospheric Warmings (SSWs) are extreme polar atmospheric disturbances that significantly impact mid-latitude cold surges, but their early prediction remains a challenge for conventional numerical models. In this study, we propose a video prediction framework for SSW forecasting and introduce a Decoupled Spatiotemporal Memory Flow Network (DSMF-Net) to more effectively capture the dynamic evolution of stratospheric polar vortices. DSMF-Net separates spatial and temporal dependencies using specialized memory flow modules, enabling fine-grained modeling of vortex morphology and dynamic transitions. Experiments on representative SSW events from 2018 to 2021 show that DSMF-Net can reliably predict SSW occurrences up to 20 days in advance while accurately replicating the evolution of polar vortex structures. Compared to baseline models such as the Predictive Recurrent Neural Network (PredRNN) and Motion Recurrent Neural Network (MotionRNN), our method achieves consistent improvements across various metrics, with average gains of 10.5% in Mean Squared Error (MSE) and 6.4% in Mean Absolute Error (MAE) and a 0.7% increase in the Structural Similarity Index Measure (SSIM). These findings underscore the potential of deep video prediction frameworks to improve medium-range stratospheric forecasts and bridge the gap between data-driven models and atmospheric dynamics.

1. Introduction

Sudden Stratospheric Warmings (SSWs) are extreme atmospheric disturbances occurring in the polar stratosphere during the boreal winter, marked by an abrupt rise in polar temperatures and a reversal of zonal-mean winds from westerly to easterly [1]. These events significantly disrupt the polar vortex and alter planetary wave activity, thereby affecting mid-latitude weather and climate through stratosphere–troposphere coupling processes [2,3,4]. SSWs are frequently linked to severe cold-air outbreaks across continental regions of the Northern Hemisphere as well as intensified subtropical dry and hot anomalies [5]. Consequently, enhancing the ability to predict SSWs is crucial for disaster mitigation and climate risk management.
Despite ongoing advancements in numerical weather prediction systems, accurately forecasting SSWs at medium- to long-range timescales remains a significant challenge. Ensemble forecasts generally capture SSW signals within five days of onset; however, their detection skill declines sharply beyond two weeks, with fewer than half of the ensemble members correctly predicting the event timing [6,7]. This limitation stems from the highly nonlinear dynamics of stratospheric processes, complex wave–mean flow and wave–wave interactions, and uncertainties related to model resolution, physical parameterizations, and initial conditions [8,9]. As a result, the development of data-driven models capable of automatically extracting key dynamical features and capturing multiscale spatiotemporal dependencies has emerged as a promising direction for improving SSW predictability.
In recent years, artificial intelligence has been increasingly integrated into meteorological modeling, providing complementary strengths to traditional dynamical approaches. Deep learning models have shown strong potential in capturing nonlinear relationships within complex atmospheric systems. For example, Zhang et al. [10] proposed a physics-constrained generative model for short-term precipitation nowcasting; Lam et al. [11] utilized graph neural networks (GNNs) [12] for autoregressive medium-range forecasting; Bi et al. [13] proposed Pangu-Weather, a three-dimensional Earth-specific Transformer that outperformed traditional numerical models in medium-range forecasts of multiple atmospheric fields; and Kochkov et al. [14] coupled neural networks with general circulation models to achieve comparable performance to European Centre for Medium-Range Weather Forecasts (ECMWF) performance at 15-day lead times.
From a methodological standpoint, atmospheric forecasting can be viewed as a spatiotemporal modeling task that aims to capture both the spatial structures and temporal evolution of dynamic fields. This formulation closely aligns with the video prediction problem in computer vision, where both the input and output consist of sequences of image frames representing evolving physical states. Since the introduction of the Convolutional Long Short-Term Memory (ConvLSTM) network [15], video prediction techniques have been widely applied to meteorological tasks such as precipitation, cloud, and wind field forecasting [16,17]. However, most existing methods have focused primarily on tropospheric processes, and their application to the stratosphere—particularly for predicting the morphological evolution of the polar vortex—remains largely unexplored.
To address this gap, we propose a Decoupled Spatiotemporal Memory Flow Network (DSMF-Net), introducing the video prediction paradigm into SSW forecasting. DSMF-Net explicitly separates spatial and temporal dependencies through a dual-memory ST-LSTM architecture and integrates a memory flow mechanism to improve the modeling of multiscale dynamical evolution. Using potential vorticity (PV) fields from the ERA5 reanalysis dataset as input, our model accurately predicts representative SSW events (2018, 2019, and 2021) up to 20 days in advance, successfully reconstructing both displacement- and split-type vortex evolutions. Compared to baseline models such as PredRNN and MotionRNN, DSMF-Net achieves improvements of approximately 10.5% in MSE and 6.4% in MAE and a 0.7% increase in SSIM.
To sum up, the main novelties of this paper can be characterized as follows:
  • We propose DSMF-Net, a novel video prediction framework for stratospheric sudden warming forecasting, which introduces a cross-layer spatiotemporal memory flow and a dual-memory ST-LSTM to decouple long- and short-term dynamical dependencies.
  • A decoupled memory regularization mechanism is designed to enhance long-lead stability and improve physical interpretability without increasing model complexity.
  • Comprehensive experiments demonstrate that DSMF-Net achieves stable 20-day lead forecasts and consistently outperforms existing deep learning and ensemble numerical models in both accuracy and structural consistency.

2. Related Work

2.1. Traditional Approaches to SSW Prediction

The prediction of Sudden Stratospheric Warmings has traditionally relied on general circulation models (GCMs) and statistical approaches. While dynamical ensemble forecasts have demonstrated that current models can generally capture SSW onset signals within a 5–10-day lead time [18,19,20], their predictive skill deteriorates markedly beyond two weeks [21]. This limitation arises from several factors, including insufficient model resolution, accumulated errors in initial conditions and physical parameterizations [22,23], and the inherently complex triggering mechanisms of SSWs. These events may be initiated by tropospheric precursors—such as enhanced planetary wave activity, blocking highs, and the North Pacific dipole pattern [24,25,26,27,28]—or by nonlinear dynamical interactions occurring within the stratosphere itself [29,30]. Recent studies further highlight that model deficiencies in representing planetary wave breaking and the upward propagation of wave energy significantly contribute to forecast degradation at subseasonal timescales [31].
To address these challenges, researchers have developed a range of data-driven methods based on observational statistics and model diagnostics. Techniques such as Empirical Orthogonal Function and Principal Component Analysis have been employed to extract dominant modes of stratospheric variability and identify early warning signals [32,33,34]. Geometric analysis of the polar vortex and extreme value theory have been applied to detect displacement- and split-type precursors [35,36]. Additionally, causal discovery algorithms and conditional probability frameworks have been used to quantify how different circulation states influence the likelihood of SSW occurrence [37]. While these approaches have enhanced the understanding of SSW precursors and improved short-term predictive skill, they remain heavily reliant on expert-driven feature selection and are limited in capturing the nonlinear coupling within the three-dimensional atmospheric system. Furthermore, the rarity of SSW events in observational records [38] constrains sample size and undermines the robustness of statistical learning models for extended-range forecasting.

2.2. Deep Learning-Based SSW Prediction

In recent years, artificial intelligence has achieved significant success across various meteorological forecasting tasks and is increasingly viewed as a promising complement to conventional numerical models. At short-term timescales, physics-constrained generative models and Convolutional Long Short-Term Memory (ConvLSTM) architectures have demonstrated strong capabilities in minute-scale precipitation nowcasting [10]. For medium-range prediction, Graph Neural Networks and 3D Transformer architectures have proven effective in capturing multiscale atmospheric structures [11]. At extended and subseasonal timescales, hybrid dynamical–statistical frameworks and probabilistic learning approaches have achieved performance comparable to or exceeding that of ECMWF ensemble forecasts [39,40]. On interannual timescales, deep convolutional networks have notably extended the predictability of the El Niño–Southern Oscillation up to 17 months [41]. Collectively, these studies demonstrate that deep learning models can effectively learn nonlinear, multiscale dynamical relationships, offering new avenues for enhancing the predictability of extreme weather and climate phenomena [42].
However, deep learning research focused on the stratosphere remains limited, largely due to data scarcity and the complexity of event-based definitions. Most existing studies emphasize continuous-field reconstruction or anomaly detection rather than direct event forecasting. For example, variational autoencoders have been applied to reconstruct displacement- and split-type polar vortex structures [43], while Long Short-Term Memory networks have been used to model the temporal evolution of stratospheric geopotential height and vortex intensity [44]. Additionally, computer vision and statistical learning approaches have been employed to characterize the three-dimensional morphology and precursor patterns of the polar vortex [45,46].

3. Data and Methodology

3.1. Data and Preprocessing

This study utilizes the ERA5 reanalysis dataset provided by the European Centre for Medium-Range Weather Forecasts (ECMWF), covering the period from 1 January 1979 to 31 December 2024. The potential vorticity (PV) and geopotential height fields at the 10 hPa isobaric level are selected as the primary variables. In this study, major SSW events are identified following the widely adopted criterion of, which defines an onset when the zonal-mean zonal wind at 10 hPa and 60 N reverses from westerly to easterly [47]. The 10 hPa level reflects its central role in stratospheric vortex morphology and SSW diagnostics, while PV provides a dynamically conserved quantity that accurately captures vortex boundaries, wave-breaking signatures, and structural deformation during SSW evolution. The data are extracted at a daily 00:00 UTC temporal resolution, yielding one instantaneous snapshot per day. The fields are provided at a horizontal resolution of 1 × 1 on a 180 × 180 grid, resulting in approximately 16,802 daily samples over the analysis period.
As shown in Figure 1, to isolate the Northern Hemisphere stratospheric polar vortex, the dataset is spatially cropped to the region between 27 N and 90 N in latitude and 180 W–180 E in longitude. The selected domain is then resampled via bilinear interpolation, resulting in 64 and 256 grid points in the latitudinal and longitudinal directions, respectively. Each processed frame thus has a spatial resolution of 64 × 256 . Before model training, each variable is normalized using channel-wise z-score standardization computed from the training set.
Temporally, the daily PV fields are organized into spatiotemporal sequences of 30 frames, with the first 10 frames used as model input and the subsequent 20 frames as prediction or validation. A stride of 30 days is applied to construct non-overlapping sequences, ensuring temporal independence between samples. In total, 560 sequences are generated and split chronologically into training, validation, and test sets in a ratio of approximately 384:96:80.

3.2. DSMF-Net

To accurately capture the nonlinear evolution of polar vortex morphology during SSW events—and to address the challenges of long-term dependency and error accumulation in medium-range forecasting—we propose the Decoupled Spatiotemporal Memory Flow Network (DSMF-Net), built upon the PredRNN [5] framework. DSMF-Net integrates a structured memory decoupling mechanism and a dynamic training strategy within an end-to-end recurrent convolutional architecture to explicitly separate long-term (subseasonal) background dynamics from short-term (synoptic-scale) transient perturbations, thereby enhancing forecast stability and physical consistency.
The core components of DSMF-Net include a dual-stream Spatiotemporal Long Short-Term Memory (ST-LSTM) unit, a decoupling regularization term, and a reverse scheduled sampling (RSS) strategy. The dual-memory unit incorporates both temporal memory C and spatiotemporal memory M , forming a zigzag memory flow that facilitates cross-layer information exchange. This idea of enabling cross-layer or directional memory propagation has also been explored in broader spatiotemporal recurrent architectures for video prediction [48]. The overall structure of this memory flow, together with the full DSMF-Net architecture, is illustrated in Figure 2. The decoupling regularization term imposes orthogonality constraints between C and M to achieve multiscale dynamical separation. Meanwhile, the RSS strategy progressively introduces autoregressive signals during the encoding phase, improving the model’s ability to represent long-term non-Markovian dynamics [49].

3.2.1. Decoupled Spatiotemporal Memory Flow

The Decoupled Spatiotemporal Memory Flow (DSMF) unit serves as the fundamental building block enabling spatiotemporal coupling within our network. As illustrated in Figure 3, it extends the conventional ConvLSTM [15] architecture by introducing an additional memory pathway—the spatiotemporal memory M —thereby forming a dual–memory system. This design enables the model to jointly capture long-term temporal coherence through the temporal memory C and localized short-term structural deformation through the spatiotemporal memory M .
In a DSMF unit at layer l and time step t, the state transition depends on the current input X t , the previous hidden state H t 1 ( l ) , and the vertically propagated spatiotemporal memory from the preceding layer M t ( l 1 ) . Here, t indexes the temporal step and ( l ) indexes the recurrent layer. The operators ∗ and ∘ denote convolution and element-wise multiplication, respectively. The temporal memory C t ( l ) propagates strictly along the temporal axis ( t t + 1 ) and preserves large-scale, slowly varying background structures such as the seasonal evolution of the polar vortex and planetary wave patterns.
The gates associated with the temporal memory C are computed using convolutional operators, where σ ( · ) is the logistic sigmoid function and tanh ( · ) is the hyperbolic tangent. The variables i t , f t , and g t represent the input gate, forget gate, and input-modulation gate, respectively. The trainable kernels W x , W h , and W m denote the convolutions applied to the input X t , the hidden state H t 1 ( l ) , and the vertically propagated memory M t ( l 1 ) :
i t = σ ( W x i X t + W h i H t 1 ( l ) + W m i M t ( l 1 ) ) , f t = σ ( W x f X t + W h f H t 1 ( l ) + W m f M t ( l 1 ) ) , g t = tanh ( W x g X t + W h g H t 1 ( l ) + W m g M t ( l 1 ) ) .
The temporal memory state is updated as:
C t ( l ) = f t C t 1 ( l ) + i t g t .
The spatiotemporal memory M t ( l ) follows a distinctive zigzag propagation pathway. It propagates vertically at the same time step t to transfer fine-grained spatial features across network layers, and horizontally ( M t ( L ) M t + 1 ( 1 ) ) across time steps. Its update is controlled by an independent set of gates, denoted by i t , f t , and g t :
i t = σ ( W x i X t + W h i H t 1 ( l ) + W m i M t ( l 1 ) ) , f t = σ ( W x f X t + W h f H t 1 ( l ) + W m f M t ( l 1 ) ) , g t = tanh ( W x g X t + W h g H t 1 ( l ) + W m g M t ( l 1 ) ) .
The spatiotemporal memory state is updated according to:
M t ( l ) = f t M t ( l 1 ) + i t g t .
The output hidden state H t ( l ) integrates both memory representations. The output gate o t is defined as:
o t = σ ( W x o X t + W h o H t 1 ( l ) + W c o C t ( l ) + W m o M t ( l ) ) .
Finally, C t ( l ) and M t ( l ) are concatenated along the channel dimension and passed through a 1 × 1 convolution for feature fusion:
H t ( l ) = o t tanh ( W 1 × 1 Concat ( C t ( l ) , M t ( l ) ) ) .

3.2.2. Memory Decoupling Regularization

A critical challenge in dual-memory systems lies in the risk of redundant feature learning, where C and M may encode overlapping information despite their distinct propagation pathways. Such redundancy undermines the core objective of disentangling the multiscale dynamics of SSW events.
To address this, we introduce the Memory Decoupling Regularization ( L decouple ), which enforces explicit separation by maximizing the orthogonality between the memory increments of C and M . By focusing on memory increments, the regularization specifically targets the new contributions of the current input to long-term ( Δ C ) and short-term ( Δ M ) representations.
These increments are extracted from the input-modulation terms ( i t g t and i t g t ), which are passed through a shared 1 × 1 convolutional layer W decouple for feature alignment. Here, ∘ denotes the element-wise (Hadamard) product and ∗ denotes convolution:
Δ C t ( l ) = W decouple ( i t g t ) , Δ M t ( l ) = W decouple ( i t g t ) .
This constraint introduces an essential inductive bias: C is encouraged to encode low-frequency, slowly varying signals (e.g., stratospheric mean flow), while M is specialized for high-frequency, localized perturbations (e.g., wave activity and vortex fragmentation).
The loss L decouple is formulated as the sum of the absolute cosine similarities between the memory increment vectors across all channels c, layers l, and time steps t. Here, · , · c denotes the dot product on channel c and · c denotes the 2 norm of the flattened feature map of channel c:
L decouple = t , l , c | Δ C t ( l ) , Δ M t ( l ) c | Δ C t ( l ) c Δ M t ( l ) c .
By minimizing this loss, the feature representations Δ C and Δ M are pushed toward orthogonality, enabling a more efficient and interpretable decomposition of SSW dynamics in the latent space. Notably, W decouple is used only during training and does not contribute to model size during inference.

3.2.3. Reverse Scheduled Sampling

Achieving stable performance in long-lead SSW prediction requires a training strategy that mitigates the distribution mismatch between training and inference phases. In standard Seq−to−Seq training, the encoder relies heavily on the ground truth input X t , which obscures the need to propagate long-term dependencies through the recurrent states and limits the model’s ability to capture non-Markovian dynamics.
To address this limitation, we adopt the Reverse Scheduled Sampling (RSS) strategy, which dynamically adjusts the input distribution during the encoding phase. Specifically, RSS employs a mirrored sampling schedule between the encoder and the forecaster. Here, Bern denotes a Bernoulli sampling operation, X ^ t represents the model prediction at timestep t, and ε k and η k are sampling probabilities that depend on the training iteration k:
X t + 1 in = Bern ( ε k ) X t + 1 + 1 Bern ( ε k ) X ^ t + 1 , t T ( Encoder ) , Bern ( η k ) X t + 1 + 1 Bern ( η k ) X ^ t + 1 , t > T ( Forecaster ) .
Encoder (RSS): Increasing the Ground Truth Probability ε k . During the encoding phase, the probability of using ground truth inputs, ε k , progressively increases with the number of training iterations k, typically following a sigmoid or exponential schedule. In the early stages of training (i.e., when ε k is low), the encoder is more likely to receive its own prediction X ^ t + 1 instead of the true input. This deliberate injection of noise functions as a form of curriculum learning, forcing the encoder to rely on the dual-memory states ( C , M ) to compensate for absent information and to extract robust non-Markovian features from the historical context.
Forecaster (Scheduled Sampling): Decreasing the Ground Truth Probability η k . The forecaster adopts the standard Scheduled Sampling policy [49], where the probability of using ground truth inputs, η k , progressively decreases with training, enabling a smooth transition toward the fully autoregressive inference mode required for long-lead prediction.

3.2.4. Loss Function

The final training objective is defined as a weighted sum of the 2 reconstruction loss and the memory decoupling regularization term. Here, · 2 denotes the 2 norm and T all represents the total sequence length:
L final = t = 1 T all X ^ t X t 2 2 + λ L decouple ,
where λ is a tunable regularization coefficient selected based on validation performance. Model optimization is performed using the Adam optimizer, with gradient clipping applied to maintain numerical stability and ensure robust convergence.

4. Experiment and Analysis

4.1. Baselines

This study compares DSMF-Net with several representative video prediction baselines:
  • PredRNN [5]: Utilizes spatiotemporal LSTM units and introduces dual flows of hidden and spatiotemporal memories to jointly model localized dynamics and large-scale background evolution.
  • MotionRNN [50]: Decomposes transient variations and motion trends through MotionGRU and cross-layer motion highways, explicitly modeling motion accumulation to mitigate blurring and displacement errors in long-term forecasts.

4.2. Evaluation Metrics

To quantitatively evaluate the performance of the proposed model in forecasting SSW events, we adopt a three-level evaluation scheme encompassing numerical accuracy, structural and perceptual similarity, and event-level consistency. Let the observed sequence be { Y t } t = 1 T and the predicted sequence be { Y ^ t } t = 1 T , both represented as tensors of size C × H × W .
  • Numerical Error Metrics. We employ Mean Absolute Error (MAE) and Mean Squared Error (MSE) to evaluate pixel-wise intensity deviations:
    MAE = 1 N t , c , i , j Y t ( c , i , j ) Y ^ t ( c , i , j ) ,
    MSE = 1 N t , c , i , j Y t ( c , i , j ) Y ^ t ( c , i , j ) 2 ,
    where N = T × C × H × W . MAE reflects the average deviation, while MSE is more sensitive to outliers, capturing the model’s robustness under extreme perturbations and rapid morphological transitions.
  • Structural and Perceptual Similarity Metrics. To assess the model’s ability to reproduce polar vortex morphology, we adopt the Structural Similarity Index (SSIM) and the Learned Perceptual Image Patch Similarity (LPIPS). SSIM [51] is a full-reference image quality metric that evaluates similarity in terms of luminance, contrast, and structural information:
    SSIM ( Y t , Y ^ t ) = ( 2 μ Y μ Y ^ + c 1 ) ( 2 σ Y Y ^ + c 2 ) ( μ Y 2 + μ Y ^ 2 + c 1 ) ( σ Y 2 + σ Y ^ 2 + c 2 ) ,
    where μ Y and μ Y ^ denote mean values, σ Y and σ Y ^ denote standard deviations, σ Y Y ^ is the covariance, and c 1 , c 2 are constants for numerical stability. SSIM values range from 0 to 1, with higher values indicating better structural fidelity. LPIPS [52] measures deep-feature distances using a pre-trained convolutional network (VGGNet in this study), providing a perceptual similarity score that better aligns with human judgment. Unlike pixel-wise metrics, LPIPS captures high-level distortions in spatial structures.
  • Anomaly Correlation Coefficient. To assess phase consistency and dynamical forecasting skill, we adopt the Anomaly Correlation Coefficient (ACC), a standard metric in reanalysis and operational forecasts [53]. ACC measures the spatial correlation between predicted and observed anomalies—i.e., deviations from climatology:
    ACC = i ( Y i Y ¯ ) ( Y ^ i Y ^ ¯ ) i ( Y i Y ¯ ) 2 i ( Y ^ i Y ^ ¯ ) 2 ,
    where i indexes spatial grid points, Y i and Y ^ i are anomaly values after removing the climatological mean, and Y ¯ and Y ^ ¯ denote spatial means over the verification domain. Higher ACC values indicate stronger spatial–temporal phase alignment in capturing key dynamical structures such as vortex displacement, splitting, and SSW onset.

4.3. Basic Settings

To ensure a fair comparison, all models were trained and evaluated using the same data splits and evaluation protocol, with identical implementations and input–output configurations (10 input frames and 10/20 predicted frames). All experiments were conducted on an RTX-4090 GPU. The main hyperparameter settings used in the experiments are summarized in Table 1.

4.4. Experimental Results

To comprehensively evaluate the predictive performance of the proposed DSMF-Net, we conducted systematic comparisons against representative baseline models, as illustrated in Figure 4. Four widely used evaluation metrics were employed to jointly assess numerical accuracy and perceptual consistency.
As shown in Figure 4a,b, DSMF-Net consistently achieves the lowest error levels in both MSE and MAE across all forecast horizons. Furthermore, the rate of error growth with increasing lead time is notably slower compared to the baselines, highlighting the model’s superior temporal stability and generalization capability for long-range prediction. Figure 4c further demonstrates that DSMF-Net maintains the highest SSIM throughout the prediction window, indicating its strong ability to capture the evolving morphology of the stratospheric polar vortex and preserve fine-scale structural coherence. In Figure 4d, DSMF-Net also achieves the lowest LPIPS values, suggesting that its predictions exhibit stronger perceptual realism and higher visual fidelity relative to the ground truth.
To quantify these improvements, Table 2 reports the detailed metric values under both short-term (10 → 10) and long-term (10 → 20) forecasting settings. Across all evaluation criteria, DSMF-Net consistently outperforms the competing models. In the short-term setting, DSMF-Net achieves a 3.4% reduction in MSE and a 4.5% reduction in MAE compared to MotionRNN, while also delivering the highest SSIM (0.910) and lowest LPIPS (0.081). For the more challenging long-term forecasts, DSMF-Net maintains the lowest numerical errors (MSE = 15.655, MAE = 297.214) and superior perceptual quality (SSIM = 0.901, LPIPS = 0.095), validating its robustness and stability under extended temporal dependencies.

4.5. Evaluation Against Numerical Forecast Models

To further assess the performance of DSMF-Net in comparison with numerical models for Sudden Stratospheric Warming (SSW) prediction, we conducted a benchmark evaluation based on the ACC. Due to discrepancies in vertical resolution between the ERA5 reanalysis data and Subseasonal-to-Seasonal (S2S) prediction models, directly computing PV may introduce bias. To ensure consistency across models, the 10-hPa geopotential height (Z) is used as a unified comparison variable.
The baseline numerical models include ensemble forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) and the China Meteorological Administration (CMA). Figure 5 shows the temporal evolution of ACC for three representative SSW cases. Overall, DSMF-Net consistently outperforms the numerical models, exhibiting higher correlation values and greater temporal stability. For the December 2018 event, both CMA and ECMWF forecasts display a sharp decline in ACC beyond day 10, whereas DSMF-Net maintains values above 0.95 throughout the prediction window. Similar trends are observed in the January 2019 and January 2021 events, where the numerical models suffer a significant drop in ACC as lead time increases. In contrast, DSMF-Net maintains a nearly flat ACC curve, demonstrating superior robustness and generalization in capturing large-scale stratospheric circulation patterns and polar vortex evolution.
Disclaimer. Regarding the nearly flat ACC curves in Figure 5, it should be noted that the relatively stable ACC values observed within the 20-day window reflects the specific dynamical evolution of the selected case-study events and should not be interpreted as evidence that the model can maintain similar skill at substantially longer lead times. DSMF-Net was trained and optimized explicitly for a 10-input-to-20-output configuration, and predictive performance beyond this range would require retraining with longer sequences and a modified sampling strategy. Therefore, the results presented here should not be extrapolated beyond the 20-day forecast horizon.

4.6. Case Study

To further validate the predictive capability of DSMF-Net for real-world SSW events, we conduct case studies on three representative episodes. Following the standard definition of major SSWs—that is, a reversal of the zonal-mean zonal wind from westerly to easterly at 60° N and 10 hPa [47]—the selected events include a split-type SSW on 12 February 2018 (Figure 6a), a displacement-type SSW on 1 January 2019 (Figure 6b), and another displacement-type SSW on 5 January 2021 (Figure 6c). To emulate an operational forecasting scenario, predictions are initialized 20 days before each event, using 10 days of observed input followed by 20 days of forecast evolution.
As illustrated in Figure 6a, for the 2018 split-type event, DSMF-Net exhibits the highest consistency with reanalysis data throughout the 20-day prediction window. It accurately captures the polar vortex’s gradual transition from a single-core structure to a two-center configuration, successfully reproducing the full vortex split by 12 February. In contrast, MotionRNN captures the large-scale morphology but shows slight meridional drift and a phase lag between 29 January and 8 February, resulting in a delayed bifurcation. PredRNN detects the general split signature but produces overly smoothed fields, weakened gradients, and underrepresented vortex intensity. Overall, DSMF-Net demonstrates superior fidelity, temporal coherence, and structural stability relative to both baselines.
For the displacement-type events in 2019 and 2021 (Figure 6b,c), all models broadly capture the poleward or zonal displacement of the vortex. However, DSMF-Net achieves the most accurate alignment with the observed trajectories. MotionRNN exhibits noticeable positional bias near 5 January, while PredRNN suffers from delayed evolution and blurred morphological features. Notably, the 2019 event, which exhibits characteristics of a mixed-type transition [54], is well handled by DSMF-Net, which successfully reconstructs the key phase shift from a stable vortex to a double-core configuration—demonstrating robustness in modeling complex dynamical transitions.
In summary, DSMF-Net preserves high structural fidelity and dynamical responsiveness across a 20-day lead time, achieving both numerical accuracy and physical realism. These results underscore the model’s strength in long-range, interpretable forecasting of stratospheric dynamics.

5. Conclusions

For stratospheric polar vortex prediction, the video forecasting paradigm offers a distinct advantage over traditional numerical models by directly learning pixel-level spatiotemporal evolution from historical sequences, thereby capturing nonlinear dynamical processes in a purely data-driven manner. In this work, we propose DSMF-Net, an end-to-end spatiotemporal learning framework that replaces the conventional numerical integration paradigm. Unlike physics-based numerical models, DSMF-Net introduces a cross-layer spatiotemporal memory flow and a decoupled dual-memory ST-LSTM module to separately model long-term and short-term dynamics, enabling stable temporal consistency and interpretable structural evolution under complex dynamical regimes. The experimental results demonstrate that DSMF-Net significantly outperforms existing baselines in long-lead forecasting of SSW events, achieving superior performance in both anomaly correlation and structural reconstruction accuracy.
The proposed memory decoupling mechanism enables DSMF-Net to effectively capture the nonlinear interactions between large-scale planetary wave activity and localized transient disturbances—without the need to expand convolutional receptive fields or increase parameter count. This architectural design enhances both the model’s generalization stability and dynamical responsiveness, allowing it to accurately reconstruct key processes such as displacement, deformation, and splitting of the polar vortex. Compared with numerical models, DSMF-Net learns complex wave–mean flow interactions and nonlinear feedbacks directly from data, thereby providing more robust and temporally consistent predictions for the onset timing and structural evolution of SSW events.
Building upon these findings, future research will focus on integrating DSMF-Net with physics-constrained neural differential equations or planetary wave models to develop a hybrid prediction framework with interpretable dynamical cores. We also plan to explore the scalability and generalization capacity of the model at longer timescales. Overall, this study demonstrates that the video-prediction-based spatiotemporal modeling paradigm offers a promising new direction for long-range predictability in the climate system and lays a foundation for representation learning of multiscale atmospheric dynamics.

Author Contributions

Conceptualization, X.M. and X.L.; Methodology, X.M. and B.Y.; Validation, F.Z. and B.Y.; Formal Analysis, X.M.; Investigation, B.Y.; Data Curation, F.Z.; Writing—Original Draft, X.M.; Writing—Review and Editing, F.Z. and X.L.; Visualization, B.Y.; Supervision, X.L.; Project Administration, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Pre-Research Topics of National-Level Projects at Suzhou City University (Grant No. 2023SGY015); the Jiangsu Province Higher Education Basic Science (Natural Science) Research Project (Grant No. 23KJD170006).

Data Availability Statement

This study utilized publicly available datasets at https://cds.climate.copernicus.eu (accessed on 15 May 2025).

Acknowledgments

This research was supported by the Specialized Research Fund for State Key Laboratory of Solar Activity and Space Weather, Chinese Academy of Sciences.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Baldwin, M.P.; Ayarzagüena, B.; Birner, T.; Butchart, N.; Butler, A.H.; Charlton-Perez, A.J.; Domeisen, D.I.V.; Garfinkel, C.I.; Garny, H.; Gerber, E.P.; et al. Sudden stratospheric warmings. Rev. Geophys. 2021, 59, e2020RG000708. [Google Scholar] [CrossRef]
  2. Baldwin, M.P.; Dunkerton, T.J. Stratospheric harbingers of anomalous weather regimes. Science 2001, 294, 581–584. [Google Scholar] [CrossRef]
  3. Kidston, J.; Scaife, A.A.; Hardiman, S.C.; Mitchell, D.M.; Butchart, N.; Baldwin, M.P.; Gray, L.J. Stratospheric influence on tropospheric jet streams, storm tracks and surface weather. Nat. Geosci. 2015, 8, 433–440. [Google Scholar] [CrossRef]
  4. Baldwin, M.P.; Stephenson, D.B.; Thompson, D.W.J.; Dunkerton, T.J.; Charlton, A.J.; O’Neill, A. Stratospheric memory and skill of extended-range weather forecasts. Science 2003, 301, 636–640. [Google Scholar] [CrossRef]
  5. Wang, Y.; Long, M.; Wang, J.; Gao, Z.; Yu, P.S. PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  6. He, Y.; Zhu, X.; Sheng, Z.; He, M. Resonant waves play an important role in the increasing heat waves in Northern Hemisphere mid-latitudes under global warming. Geophys. Res. Lett. 2023, 50, e2023GL104839. [Google Scholar] [CrossRef]
  7. Karpechko, A.Y.; Charlton-Perez, A.; Balmaseda, M.; Tyrrell, N.; Vitart, F. Predicting sudden stratospheric warming 2018 and its climate impacts with a multimodel ensemble. Geophys. Res. Lett. 2018, 45, 13538–13546. [Google Scholar] [CrossRef]
  8. Jucker, M.; Reichler, T. Dynamical precursors for statistical prediction of stratospheric sudden warming events. Geophys. Res. Lett. 2018, 45, 13124–13132. [Google Scholar] [CrossRef]
  9. Lindgren, E.A.; Sheshadri, A. The role of wave–wave interactions in sudden stratospheric warming formation. Weather Clim. Dyn. 2020, 1, 93–109. [Google Scholar] [CrossRef]
  10. Zhang, Y.; Long, M.; Chen, K.; Xing, L.; Jin, R.; Jordan, M.I.; Wang, J. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 2023, 619, 526–532. [Google Scholar] [CrossRef] [PubMed]
  11. Lam, R.; Sanchez-Gonzalez, A.; Willson, M.; Wirnsberger, P.; Fortunato, M.; Alet, F.; Ravuri, S.; Ewalds, T.; Eaton-Rosen, Z.; Hu, W.; et al. Learning skillful medium-range global weather forecasting. Science 2023, 382, 1416–1421. [Google Scholar] [CrossRef]
  12. Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef]
  13. Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef]
  14. Kochkov, D.; Yuval, J.; Langmore, I.; Norgaard, P.; Smith, J.; Mooers, G.; Klöwer, M.; Lottes, J.; Rasp, S.; Düben, P.; et al. Neural general circulation models for weather and climate. Nature 2024, 632, 1060–1066. [Google Scholar] [CrossRef]
  15. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; Woo, W.-C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
  16. Tang, S.; Li, C.; Zhang, P.; Tang, R. SwinLSTM: Improving spatiotemporal prediction accuracy using Swin Transformer and LSTM. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 13470–13479. [Google Scholar]
  17. Le Guen, V.; Thome, N. Disentangling physical dynamics from unknown factors for unsupervised video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11474–11484. [Google Scholar]
  18. Tripathi, O.P.; Charlton-Perez, A.; Sigmond, M.; Vitart, F. Enhanced long-range forecast skill in boreal winter following stratospheric strong vortex conditions. Environ. Res. Lett. 2015, 10, 104007. [Google Scholar] [CrossRef]
  19. Domeisen, D.I.V.; Grams, C.M.; Papritz, L. The role of North Atlantic–European weather regimes in the surface impact of sudden stratospheric warming events. Weather Clim. Dyn. 2020, 1, 373–388. [Google Scholar] [CrossRef]
  20. Garfinkel, C.I.; Son, S.-W.; Song, K.; Aquila, V.; Oman, L.D. Stratospheric variability contributed to and sustained the recent hiatus in Eurasian winter warming. Geophys. Res. Lett. 2017, 44, 374–382. [Google Scholar] [CrossRef]
  21. Hitchcock, P.; Butler, A.; Charlton-Perez, A.; Garfinkel, C.I.; Stockdale, T.; Anstey, J.; Mitchell, D.; Domeisen, D.I.V.; Wu, T.; Lu, Y.; et al. Stratospheric Nudging And Predictable Surface Impacts (SNAPSI): A protocol for investigating the role of stratospheric polar vortex disturbances in subseasonal to seasonal forecasts. Geosci. Model Dev. 2022, 15, 5073–5092. [Google Scholar] [CrossRef]
  22. Rao, J.; Ren, R.; Chen, H.; Yu, Y.; Zhou, Y. The stratospheric sudden warming event in February 2018 and its prediction by a climate system model. J. Geophys. Res. Atmos. 2018, 123, 13332–13345. [Google Scholar] [CrossRef]
  23. He, Y.; Zhu, X.; Sheng, Z.; He, M. Identification of stratospheric disturbance information in China based on the round-trip intelligent sounding system. Atmos. Chem. Phys. 2024, 24, 3839–3856. [Google Scholar] [CrossRef]
  24. Polvani, L.M.; Waugh, D.W. Upward wave activity flux as a precursor to extreme stratospheric events and subsequent anomalous surface weather regimes. J. Clim. 2004, 17, 3548–3554. [Google Scholar] [CrossRef]
  25. Martius, O.; Polvani, L.M.; Davies, H.C. Blocking precursors to stratospheric sudden warming events. Geophys. Res. Lett. 2009, 36, L14806. [Google Scholar] [CrossRef]
  26. Cohen, J.; Jones, J. Tropospheric precursors and stratospheric warmings. J. Clim. 2011, 24, 6562–6572. [Google Scholar] [CrossRef]
  27. Lehtonen, I.; Karpechko, A.Y. Observed and modeled tropospheric cold anomalies associated with sudden stratospheric warmings. J. Geophys. Res. Atmos. 2016, 121, 1591–1610. [Google Scholar] [CrossRef]
  28. Karpechko, A.Y.; Hitchcock, P.; Peters, D.H.W.; Schneidereit, A. Predictability of downward propagation of major sudden stratospheric warmings. Q. J. R. Meteorol. Soc. 2017, 143, 1459–1470. [Google Scholar] [CrossRef]
  29. Scott, R.K.; Polvani, L.M. Stratospheric control of upward wave flux near the tropopause. Geophys. Res. Lett. 2004, 31, L02115. [Google Scholar] [CrossRef]
  30. Ming, A.; Hitchcock, P.; Haynes, P. The double peak in upwelling and heating in the tropical lower stratosphere. J. Atmos. Sci. 2016, 73, 1889–1901. [Google Scholar] [CrossRef]
  31. Hirahara, S.; Ishikawa, I.; Fujii, Y.; Nakano, H.; Tsujino, H.; Adachi, Y.; Naoe, H. Tropospheric and stratospheric boreal winter jet response to eddying ocean in a seasonal forecast system. J. Geophys. Res. Atmos. 2024, 129, e2023JD040444. [Google Scholar] [CrossRef]
  32. Ren, R.; Cai, M. Polar vortex oscillation viewed in an isentropic potential vorticity coordinate. Adv. Atmos. Sci. 2006, 23, 884–900. [Google Scholar] [CrossRef]
  33. Lu, C.; Ding, Y. Analysis of isentropic potential vorticities for the relationship between stratospheric anomalies and the cooling process in China. Sci. Bull. 2015, 60, 726–738. [Google Scholar] [CrossRef]
  34. Lu, C.; Zhou, B.; Ding, Y. Decadal variation of the Northern Hemisphere annular mode and its influence on the East Asian trough. J. Meteorol. Res. 2016, 30, 584–597. [Google Scholar] [CrossRef]
  35. Hannachi, A.; Mitchell, D.; Gray, L.; Charlton-Perez, A. On the use of geometric moments to examine the continuum of sudden stratospheric warmings. J. Atmos. Sci. 2011, 68, 657–674. [Google Scholar] [CrossRef]
  36. Mitchell, D.M.; Charlton-Perez, A.J.; Gray, L.J. Characterizing the variability and extremes of the stratospheric polar vortices using 2D moment analysis. J. Atmos. Sci. 2011, 68, 1194–1213. [Google Scholar] [CrossRef]
  37. Kretschmer, M.; Runge, J.; Coumou, D. Early prediction of extreme stratospheric polar vortex states based on causal precursors. Geophys. Res. Lett. 2017, 44, 8592–8600. [Google Scholar] [CrossRef]
  38. Butler, A.H.; Sjoberg, J.P.; Seidel, D.J.; Rosenlof, K.H. A sudden stratospheric warming compendium. Earth Syst. Sci. Data 2017, 9, 63–76. [Google Scholar] [CrossRef]
  39. Chen, L.; Zhong, X.; Li, H.; Wu, J.; Lu, B.; Chen, D.; Xie, S.-P.; Wu, L.; Chao, Q.; Lin, C.; et al. A machine learning model that outperforms conventional global subseasonal forecast models. Nat. Commun. 2024, 15, 6425. [Google Scholar] [CrossRef]
  40. Chen, Y.-C.; Liang, Y.-C.; Wu, C.-M.; Huang, J.-D.; Lee, S.H.; Wang, Y.; Zeng, Y.-J. Exploiting a variational auto-encoder to represent the evolution of sudden stratospheric warmings. Environ. Res. Clim. 2024, 3, 025006. [Google Scholar] [CrossRef]
  41. Ham, Y.-G.; Kim, J.-H.; Luo, J.-J. Deep learning for multi-year ENSO forecasts. Nature 2019, 573, 568–572. [Google Scholar] [CrossRef] [PubMed]
  42. Tang, Y.; Dong, P.; Tang, Z.; Chu, X.; Liang, J. VMRNN: Integrating vision mamba and LSTM for efficient and accurate spatiotemporal forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 5663–5673. [Google Scholar]
  43. Mooers, G.; Tuyls, J.; Mandt, S.; Pritchard, M.; Beucler, T.G. Generative modeling of atmospheric convection. In Proceedings of the 10th International Conference on Climate Informatics, Virtual, 22–25 September 2020; pp. 98–105. [Google Scholar]
  44. Peng, K.; Cao, X.; Liu, B.; Guo, Y.; Xiao, C.; Tian, W. Polar vortex multi-day intensity prediction relying on new deep learning model: A combined convolution neural network with long short-term memory based on Gaussian smoothing method. Entropy 2021, 23, 1314. [Google Scholar] [CrossRef]
  45. Lawrence, Z.D.; Manney, G.L. Characterizing stratospheric polar vortex variability with computer vision techniques. J. Geophys. Res. Atmos. 2018, 123, 1510–1535. [Google Scholar] [CrossRef]
  46. de Fondeville, R.; Wu, Z.; Székely, E.; Obozinski, G.; Domeisen, D.I.V. Improved extended-range prediction of persistent stratospheric perturbations using machine learning. Weather. Clim. Dyn. 2023, 4, 287–307. [Google Scholar] [CrossRef]
  47. Charlton, A.J.; Polvani, L.M. A new look at stratospheric sudden warmings. Part I: Climatology and modeling benchmarks. J. Clim. 2007, 20, 449–469. [Google Scholar] [CrossRef]
  48. Kalchbrenner, N.; Oord, A.; Simonyan, K.; Danihelka, I.; Vinyals, O.; Graves, A.; Kavukcuoglu, K. Video pixel networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1771–1779. [Google Scholar]
  49. Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the Advances in Neural Information Processing Systems 28, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
  50. Wu, H.; Yao, Z.; Wang, J.; Long, M. MotionRNN: A flexible model for video prediction with spacetime-varying motions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15435–15444. [Google Scholar]
  51. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  52. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
  53. World Meteorological Organization. Guidelines on Ensemble Prediction Systems and Forecasting; WMO-No. 1091; World Meteorological Organization: Geneva, Switzerland, 2012.
  54. Rao, J.; Garfinkel, C.I.; Chen, H.; White, I.P. The 2019 new year stratospheric sudden warming and its real-time predictions in multiple S2S models. J. Geophys. Res. Atmos. 2019, 124, 11155–11174. [Google Scholar] [CrossRef]
Figure 1. Diagram of the data preprocessing workflow.
Figure 1. Diagram of the data preprocessing workflow.
Atmosphere 16 01316 g001
Figure 2. (a): the spatiotemporal memory flow architecture that uses ConvLSTM as the building block. (b): the main architecture of DSMF-Net. The red arrows denote the state transition paths of M t l .
Figure 2. (a): the spatiotemporal memory flow architecture that uses ConvLSTM as the building block. (b): the main architecture of DSMF-Net. The red arrows denote the state transition paths of M t l .
Atmosphere 16 01316 g002
Figure 3. The ST-LSTM unit with twisted memory states that serves as the building block, where the blue circles denote the unique structures compared with ConvLSTM.
Figure 3. The ST-LSTM unit with twisted memory states that serves as the building block, where the blue circles denote the unique structures compared with ConvLSTM.
Atmosphere 16 01316 g003
Figure 4. Comparison with baseline models. (a) Comparison of MSE. (b) Comparison of MAE. (c) Comparison of SSIM. (d) Comparison of LPIPS. Higher SSIM and lower LPIPS indicate better structural and perceptual consistency.
Figure 4. Comparison with baseline models. (a) Comparison of MSE. (b) Comparison of MAE. (c) Comparison of SSIM. (d) Comparison of LPIPS. Higher SSIM and lower LPIPS indicate better structural and perceptual consistency.
Atmosphere 16 01316 g004
Figure 5. Comparison of ACC performance between the proposed DSMF-Net and numerical forecast systems across three representative SSW cases. (a) December 2018 SSW event; (b) January 2019 SSW event; (c) January 2021 SSW event.
Figure 5. Comparison of ACC performance between the proposed DSMF-Net and numerical forecast systems across three representative SSW cases. (a) December 2018 SSW event; (b) January 2019 SSW event; (c) January 2021 SSW event.
Atmosphere 16 01316 g005
Figure 6. Three representative SSW cases are used to evaluate the prediction performance of the models. “GT” denotes the ground truth fields. (a) Forecasting the split-type SSW event on 12 February 2018, with a 20-day lead time. (b) Forecasting the displacement-type SSW event on 1 January 2019, with a 20-day lead time. (c) Forecasting the displacement-type SSW event on 5 January 2021, with a 20-day lead time.
Figure 6. Three representative SSW cases are used to evaluate the prediction performance of the models. “GT” denotes the ground truth fields. (a) Forecasting the split-type SSW event on 12 February 2018, with a 20-day lead time. (b) Forecasting the displacement-type SSW event on 1 January 2019, with a 20-day lead time. (c) Forecasting the displacement-type SSW event on 5 January 2021, with a 20-day lead time.
Atmosphere 16 01316 g006aAtmosphere 16 01316 g006b
Table 1. Hyperparameter settings.
Table 1. Hyperparameter settings.
ParameterSetting
Image size (H × W) 64 × 256
Hidden channels128, 128, 128, 128
Dropout0.1
Learning rate 3 × 10 4
Batch size8
Max iterations20,000
OptimizerAdam
Table 2. Quantitative comparison of different models under short-term and long-term forecasting settings. ↑ means higher is better and ↓ means lower is better.
Table 2. Quantitative comparison of different models under short-term and long-term forecasting settings. ↑ means higher is better and ↓ means lower is better.
ModelMSE (↓)MAE (↓)SSIM (↑)LPIPS (↓)
MotionRNN (10to10)12.387265.6330.9060.087
PredRNN (10to10)13.411281.2550.9040.088
DSMF-Net (10to10)11.969253.2770.9100.081
MotionRNN (10to20)17.717319.7420.8950.100
PredRNN (10to20)17.281315.6250.8940.098
DSMF-Net (10to20)15.655297.2140.9010.095
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, X.; Zhao, F.; Yue, B.; Liu, X. DSMF-Net: A Spatiotemporal Memory Flow Network for Long-Range Prediction of Stratospheric Sudden Warming Events. Atmosphere 2025, 16, 1316. https://doi.org/10.3390/atmos16121316

AMA Style

Ma X, Zhao F, Yue B, Liu X. DSMF-Net: A Spatiotemporal Memory Flow Network for Long-Range Prediction of Stratospheric Sudden Warming Events. Atmosphere. 2025; 16(12):1316. https://doi.org/10.3390/atmos16121316

Chicago/Turabian Style

Ma, Xiao, Fengmei Zhao, Bin Yue, and Xinshuang Liu. 2025. "DSMF-Net: A Spatiotemporal Memory Flow Network for Long-Range Prediction of Stratospheric Sudden Warming Events" Atmosphere 16, no. 12: 1316. https://doi.org/10.3390/atmos16121316

APA Style

Ma, X., Zhao, F., Yue, B., & Liu, X. (2025). DSMF-Net: A Spatiotemporal Memory Flow Network for Long-Range Prediction of Stratospheric Sudden Warming Events. Atmosphere, 16(12), 1316. https://doi.org/10.3390/atmos16121316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop