1. Introduction
The stratosphere begins at approximately 8–10 km altitude near the polar regions and 15–20 km in the tropical regions, reaching an average height of up to 50 km [
1]. Sudden stratospheric warming (SSW) is defined as a sudden increase in stratospheric polar air temperature (T
s), mainly during winter months [
2]. The Arctic tropopause height is at its lowest level over the poles compared to other regions due to the cold and denser airmass, especially in winter. In the Northern Hemisphere (NH), approximately every 28 months, extremely cold air temperatures and strong westerly zonal winds (U
h) suddenly weaken or reverse, causing the stagnant polar vortex to warm abruptly within a few weeks. During these events, the winds significantly weaken or switch to easterly U
h. In the literature, these events are known as SSWs [
3].
SSW events were first observed in the early 1950s [
4] and can be monitored in detail using satellite observations [
3]. Over time, significant progress has been made in understanding the dynamic effects of SSWs through detailed observations and simulations. However, analyzing how SSW affects both the surface air temperature and upper atmosphere dynamics remains a challenge. Recent studies on stratospheric processes affecting operational forecasting have been conducted by [
5,
6]. These works, correspondingly, indicated that utilizing spatiotemporal memory flow networks for long-range SSW prediction and analyzing unprecedented winter extremes with direct surface impacts underscore the critical role of stratospheric diagnostics in operational forecasting. SSWs are generally observed in the polar regions (poleward of 60° N) in the NH, but their rarity in the Southern Hemisphere (SH) is due to weaker tropospheric planetary wave activity [
7].
The first recorded SSW event was discovered by Richard Scherhag in January 1952 [
8], using radiosonde measurements collected in Berlin, Germany. SSWs are atmospheric events that usually occur irregularly in the atmosphere. Using both observations and simulations, SSWs can be detected by a sudden increase in the average T
s (exceeding 30–40 K) in the mid-stratosphere at a height of approximately 30 km (10 hPa) [
3,
9,
10]. However, these warmings sometimes exceed 70 K, creating extreme SSW events. Such warmings in the stratosphere result from the rapid intensification of tropospheric planetary waves propagating upward, which disrupt the stratospheric general circulation [
9,
11,
12]. Tropospheric planetary waves provide the requisite dynamical forcing for SSWs, while the Quasi-Biennial Oscillation (QBO) effectively governs the stratospheric waveguide [
13,
14]. This indicates that while upward-propagating waves are the primary drivers, the QBO phase regulates the background conditions under which these waves interact with the polar vortex.
The World Meteorological Organization (WMO) established the Commission for Atmospheric Sciences (CAS) and launched the STRATALERT program in 1964 to monitor SSWs, following the January–February 1952 event. In this program, SSW phenomena were studied using radiosonde and rocketsonde observations [
13]. The results of this field campaign have shown that SSWs are related to polar vortex disruptions attributed to several factors, such as wind reversals during QBO events [
14,
15]. The QBO is a periodic shift between easterly and westerly winds in the tropical stratosphere every 28 to 30 months near the equator [
4]. These results suggested that SSW events play an important role in polar vortex development and their effect on mid-latitude weather systems.
Major studies since the 1970s analyzing T
s increase and U
h reversal pointed out that variable definitions and thresholds for SSWs exist [
16,
17]. Mclnturff et al. [
9] suggest that a temperature anomaly (ΔT
s) of >25 K can be used as a criterion for SSW detection and monitoring. These studies suggested that major category SSWs were clearly associated with the QBO phase slowdown or reversal [
9]. Charlton and Polvani [
15] refined the definition provided by McInturff et al. [
9]. This framework identifies major SSW events based on the reversal of the U
h at the 10 hPa level over 60° N latitude during winter months. While these events are fundamentally driven by the upward propagation of tropospheric planetary waves, the QBO acts as a key modulator that influences the stratospheric waveguide and the conditions under which these wave-driven reversals occur.
The relationship between the QBO and SSWs was widely used later to evaluate the dynamic consequences of SSWs and their interaction with the adjacent atmospheric layers. The SSW events are strongly related to the phases of the QBO [
18]. Dependent on the QBO phase, the stability of the polar vortex and the likelihood of SSW events are modified. Typically, the westerly phase of QBO (QBO+) keeps the vortex stable, preventing the development of SSWs. On the other hand, the easterly phase of QBO (QBO−) likely increases wave activity, and the probability of SSW occurrence [
4,
19]. Such interactions highlight a robust coupling between the tropical stratosphere and polar vortex disruptions. Beyond the modulation of the stratospheric waveguide, the persistence of the polar vortex is further influenced by teleconnections from the Madden–Julian Oscillation (MJO) and El Niño–Southern Oscillation (ENSO), which exhibit varying signatures across climate models [
20,
21]. Integrating these diverse drivers into stratospheric diagnostics is essential for reducing uncertainties in seasonal forecasts [
22]. Although the precise dynamical pathways of this tropical–polar teleconnection remain a subject of active research, the statistical relationship between the two is well-documented.
While several well-established definitions for SSW events exist, most notably the Charlton and Polvani criteria [
15] based on U
h reversals, the historical absence of a single, universally accepted classification has led to varying results regarding SSW frequency and intensity in the literature. This diversity in criteria necessitates the use of more physically meaningful and integrated metrics, such as the Threshold Exceedance Area (TEA) and Main Phase Strength (MPS) metrics employed in this study, to capture the full structural and energetic evolution of the events. For this reason, the variable criteria based on wind reversals and T
s anomalies have led to conflicting results of SSW frequency and intensity [
2]. Butler and Gerber [
23] found that nine different major SSW definitions exist, and based on these definitions, annual SSW frequency varied between 40% and 90%. These large differences indicate that standardization and physically meaningful metrics are needed. Butler and Gerber [
23] stated that the SSW definition should be free of the specific dataset, and its definition must capture the physical and dynamical characteristics of SSWs.
The newest approaches to the definition of SSWs have been provided lately. Li et al. [
2] introduced the TEA concept, the Main Phase Duration (MPD), and Main Phase Area (MPA) metrics, and these are directly related to SSW intensity for quantitative assessment. Their approaches are adapted in this study. These concepts were applied for a specific time period (historical and future) to study SSW events of both T
s and U
h, together with reference data from Fifth Generation European Centre for Medium-Range Weather Forecast (ECMWF) Reanalysis (ERA5) and five large-scale model simulations.
The main objective of this study, using SSW definition criteria, is to investigate the relationship between QBO variability and SSW events during the historical (1980–2005) and future (2006–2100) periods. To better situate this study within the context of the existing literature, it is important to note that while recent multi-model assessments [
24,
25,
26] suggest that the frequency of major SSWs will remain relatively stable in the 21st century, the physical and energetic fidelity of these events remains a critical research gap. Specifically, Ayarzagüena et al. [
24] reported no robust evidence of future changes in the frequency or timing of major SSWs across different climate models. Similarly, Rao and Garfinkel [
25] concluded that the statistical characteristics and the strength of stratospheric-tropospheric coupling of SSWs indicated negligible changes in the 21st century. Additionally, a study by Chávez-Pérez et al. [
26] highlighted that while long-term averages appear to be stable, there is significant temporal variability in the distribution of events, and this suggests that long-term aggregated assessments might mask changes in sub-periods. Our aim is to move beyond simple frequency counts, a technical aspect also addressed in the context of wave forcing spread by [
27], to evaluate the evolving spatial and energetic intensity of SSWs using TEA and MPS metrics while using various threshold criteria for SSW definition. Furthermore, in the content of our main objective, the quantitative assessment of the QBO–SSW coupling through a comparison of stratospheric variability at 10° N (serving as a tropical QBO proxy) and 60° N (polar response) is performed to identify whether model biases are rooted in tropical forcing or its high-latitude transmission. In this regard, the current work is significantly different than the studies of [
24,
25,
26] and places more emphasis on the structural and energetic sensitivity analysis of threshold criteria effects on SSW analysis.
The dataset obtained using the Coupled Model Intercomparison Project Phase 5 (CMIP5) model simulations and high-resolution ERA5 reanalysis is detailed in
Section 2.
Section 3 presents the methodology, including statistical diagnostics and the TEA metric for SSW detection.
Section 4 analyzes the comparative outcomes of past and future SSW characteristics while evaluating the quantitative link between tropical forcing and the polar vortex. Lastly,
Section 5 and
Section 6 offer a discussion of the findings and a conclusion.
2. Data
The datasets used in this study are obtained from ERA5 and from the CMIP5 under the Representative Concentration Pathway 4.5 (RCP4.5) scenario analysis. The RCP4.5 scenario assumes that greenhouse gas emissions stabilize in the second half of the 21st century. ERA5 provides relatively high spatial (0.25° × 0.25°) and temporal (hourly) resolution data that are used as the reference for the historical period (1980–2005) and as the verification reference dataset [
28]. Recent studies have validated the utility of ERA5 reanalysis data for developing atmospheric models with parameter correction methodologies [
29,
30]. While ERA5 is recognized for reliably capturing planetary wave activity [
31], addressing model-specific biases remains essential for a rigorous evaluation of simulated stratospheric variability [
32].
The CMIP5 models provide daily outputs for both the historical (1980–2005) and future (2006–2100, RCP4.5) periods.
Table 1 provides a technical comparison of the five models initially screened for this study. While all five models were considered, our methodology prioritizes models based on their vertical architecture, as detailed below. Models described as follows: M1 (ACCESS1-3) developed by the Australian Bureau of Meteorology and CSIRO, M2 (HadGEM2-CC) from the UK Met Office, M3 (MPI-ESM-MR) from the Max Planck Institute for Meteorology in Germany, M4 (GFDL-ESM2G) from NOAA/Geophysical Fluid Dynamics Laboratory in the United States, and M5 (FGOALS-g2) from LASG/IAP in China [
33].
Despite the emergence of CMIP6, the CMIP5 model ensemble remains a suitable framework for this study as it provides a reliable baseline for diagnosing deep-seated structural and energetic biases. This selection is supported by recent literature, such as Rao and Garfinkel (2021) [
25], which demonstrates no statistically significant differences in SSW frequency or basic characteristics between CMIP5 and the Coupled Model Inter-comparison Project Phase 6 (CMIP6). Most importantly, studies by Hall et al. (2021) [
34] and Karpechko et al. (2024) [
35] confirm that systemic biases in stratospheric polar vortex variability have remained essentially unchanged across model generations. Furthermore, current findings by Martínez-Andradas et al. (2025) [
27] highlight that CMIP6 models continue to exhibit large spreads and persistent biases in SSW wave forcing and intensity. Therefore, utilizing CMIP5 allows for an objective comparison with established literature while addressing fundamental challenges common to both ensembles. Future work will extend this diagnostic framework to include the CMIP6 ensemble.
M1, M2, and M3 are designated as the primary group due to their extended vertical domains and enhanced resolution in the middle-to-upper stratosphere. These structural advantages facilitate a more realistic simulation of vertical wave–mean flow coupling, which is fundamental for capturing both the life cycle of SSW events and the periodic variability of the QBO. In contrast, M4 and M5 are assigned to the secondary group because their shallower model tops restrict their capacity to consistently replicate QBO–SSW interactions (
Table 1). This methodological focus prevents potential bias in the analysis stemming from ‘low-top’ models, which are known to systematically underestimate stratospheric variability because of their inability to resolve vertical wave–mean flow coupling. Accordingly, results from M1, M2, and M3 constitute the basis of the main interpretation, whereas M4 and M5 remain beyond the scope of the present analysis.
The classification of models follows earlier evaluations of CMIP5 models [
36,
37] and emphasizes the importance of model top height in the simulation of the stratospheric variability. The models are selected based on their temporal coverage, data availability, and ability to resolve pressure levels up to 10 hPa, providing high-resolution daily data for the detection of SSW events [
18].
3. Method
To investigate SSW events in the stratosphere, daily averages, variability, and long-term trends of Ts (STs) and Uh (SUh) at the 10 hPa pressure level for the NH are examined. The data consists of ERA5 reanalysis and simulations from five CMIP5 models (M1–M5), but only the first three are utilized here due to the selection of their capability for detecting SSWs. The ERA5 dataset (originally at hourly resolution) is averaged to daily means to match the model time resolutions.
The current study covers two time periods: (1) the historical time- period from 1980 to 2005, and (2) the future time period from 2006 to 2100 under the RCP4.5 emission scenario. SSW detection relies on the TEA method, where a 30 K anomaly threshold should be exceeded for at least six consecutive days. Detected events are classified as minor, major, or extreme events. MPS metrics, which are related to both MPD and MPA, are used. More than 60% of SSWs typically occur during January–February; therefore, the analysis focuses on the November–April window to capture the full seasonal range of occurrences. The SSW characteristics related to different QBO phases are also emphasized in the analysis. A description of the TEA method and its metrics is provided in the following section.
Following earlier classifications [
23,
37], the models used in this work are grouped as higher-top or lower-top models according to the vertical extent of the pressure height. Higher-top models (M1, M2, and M3) extend into the upper stratosphere or lower mesosphere (above ~1 hPa), allowing a more realistic representation of planetary wave propagation, vortex breakdown, and associated SSW dynamics. Lower-top models (M4 and M5) terminate at 10–30 hPa, which limits vertical wave–mean flow coupling and leads to a systematic underestimation of stratospheric variability. Although M1 is included in the analysis, it should be noted that, in comparison to higher-top models, its lower top may result in an underestimation of major and extreme events.
On the other hand, due to their limited capacity in resolving stratospheric wave–mean flow interactions and the resulting insufficient number of detected SSW events for reliable statistical evaluation, M4 and M5 are removed from the analysis. Consequently, while M1 is retained for general frequency and time-series comparisons (
Section 4.1,
Section 4.2 and
Section 4.3), its limited capacity to resolve coherent thermal–dynamical feedback necessitates its exclusion from the detailed amplitude and coupling diagnostics presented in
Section 4.4,
Section 4.5 and
Section 4.6. This selective approach ensures that the high-fidelity assessment of tropical–polar coupling is based on models with adequate vertical resolution, prioritizing physical consistency over model quantity.
3.1. Profiling Temperature Anomalies for SSW Detection Using the TEA Method
To obtain SSW events, the analysis used the TEA technique [
2], where ΔT
s must remain above 30 K for at least six consecutive days. This approach was originally introduced by [
23] and then was subsequently adapted by the studies of Palmeiro et al. [
38] and Garfinkel et al. [
39]. In the analysis, the 30 K threshold at 10 hPa is utilized to identify large-scale stratospheric warmings events, and that criteria were consistent with earlier studies [
9,
17,
40].
Following established definitions based on McInturff [
9], Labitzke [
17], Baldwin et al. [
40], and Baldwin & Thompson [
41], SSWs are identified when T
s anomalies at 10 hPa exceed 30 K. This threshold ensures consistency with prior studies while isolating large-scale warming episodes. As outlined in
Table 2, T
s anomalies are derived by subtracting the long-term climatic average of T
s (T
sa) values from the daily mean T
s.
To account for model-specific mean-state biases, the baseline climatology is established independently for ERA5 and each CMIP5 simulation, a procedure essential for evaluating SSW events in climate models [
32]. For the historical period (1980–2005), anomalies are calculated relative to the daily mean of that specific timeframe. Similarly, future anomalies (2006–2100) are derived from a distinct baseline established over the entire scenario window. For instance, the anomaly for 1 January 2050 is obtained by subtracting the climatological mean of all January 1st data points within the 2006–2100 period. This period-specific baseline approach isolates episodic dynamical warmings from their respective ‘climate normals’. By treating the historical and future eras independently, we ensure that anomalies represent episodic dynamical forcing rather than the background signal of long-term greenhouse-induced stratospheric cooling. This use of fixed, period-specific baselines aligns with standard diagnostic choices in major multi-model assessments [
24], providing a consistent reference level to evaluate changes in dynamical variability relative to the projected mean state of the 21st century stratosphere.
The calculated anomalies that are used to detect SSW events are provided in
Table 2. The TEA method distinguishes between persistent SSW events and fluctuations that do not meet the duration criteria. Then, the temporal length and geographic extent of each event are assessed separately. When ΔT
s < 30 K, the event is marked as not being an SSW event. This aligns with previous studies of Charlton & Polvani [
15] and Blume et al. [
42]. The start and end dates are assigned for all SSW events in the analyzed period. After SSW detection, the latitudinal and longitudinal coordinates are used to calculate the surface area influenced by anomalies exceeding the threshold values.
The spatial coverage of each SSW event is determined using surface areas of grid cells with a 30 K threshold, and then they are summed. The sum of these areas provides the daily TEA criteria value. These computations are used for the quantification of the daily spatial extent of the SSW.
3.2. MPD, MPA, and MPS Metrics
Standardized metrics of MPD, MPA, and MPS are used to evaluate SSW events in both observations (ERA5) and numerical model simulations (M1–M3).
Table 3 presents the frequency, magnitude, and spatial distribution of SSWs. These metrics are derived using the TEA method and jointly account for the duration and spatial coverage of each event. MPD refers to the number of consecutive days with ΔT
s > 30 K, while MPA represents the average daily surface area affected by the ΔT
s > 30 K threshold [
2].
MPS, representing the magnitude of each warming event in space and time, is calculated from the product of MPD and MPA. This metric allows us to discriminate between events with different durations and spatial extents. SSW episodes are then categorized as minor (<90 × 10
6 km
2.day), major (90–180 × 10
6 km
2.day), or extreme (>180 × 10
6 km
2.day) events according to the MPS values [
2].
Table 3 shows the sampling days from the model simulations and ERA5 analysis that cover the 1980–2005 time period. This classification provides frequency and intensity of SSWs as well as stratospheric variability and model accuracy. Trend analyses on the T
s and U
h are also performed to support the areal coverage statistics for (i) all days, (ii) SSW days, and (iii) each intensity classification level (minor, major, and extreme events).
The TEA method is used to analyze the relationship between Uh variability and Ts-based SSW types. The last phase of the analysis used is the Uh data averaged between ±10° N latitude at the 10 hPa level to focus on the QBO. This analysis considers the QBO features, including seasonal patterns, amplitude variations, phase transitions, positive and negative phases, and structural variations between models.
To investigate oscillation behavior and possible connections between QBO and SSW events, time series analyses are utilized. In addition, to quantify intra-annual variability and long-term changes in T
s and U
h, three statistical diagnostics were derived from the monthly mean series. First, range-based variability for T
s (T
rng) and U
h (U
rng) was computed using a 12-month moving averaging window as the monthly maximum–minimum spread divided by 12. Second, the amplitude of variability for T
s (T
amp) and U
h (U
amp) was defined as the standard deviation (sd) of the same 12-month windowed values. Third, long-term tendencies were expressed through trend slopes, defined in their basic finite-difference form as
where
denotes the respective variable (T
s or U
h). For robustness, we also estimated slopes using ordinary least-squares (OLS) regression on monthly means; these trends are denoted as S
Ts and S
Uh, reported in K.yr
−1 and m.s
−1 yr
−1, respectively. These specific notations (T
rng, U
rng, T
amp, U
amp, etc.) are used consistently throughout the analysis.
4. Results
This section evaluates SSW events with respect to their frequency, energy intensity, and temporal evolution, while examining their relationship with the QBO. The results are provided for two distinct periods: (1) the historical era (1980–2005) and (2) the future climate (2006–2100) under the RCP 4.5 emission scenarios. The occurrence of SSWs is related to QBO phase transitions, seasonal patterns, and long-term variability, and enables a comprehensive characterization of stratospheric dynamics. The results are prepared to show the differences between the ERA5 reanalysis (as reference) and CMIP5 model simulations.
The following subsections present detailed results from the selected model outputs and time series analysis, focusing on the Uh and Ts parameters.
4.1. Selection of Numerical Models
To evaluate the CMIP5 models’ response to SSWs, the frequency and total duration of events are compared against ERA5 (
Figure 1). In the historical period (1980–2005), ERA5 identified 29 SSW events lasting a total of 337 days. Of these, 37% were classified as minor events (126 days), 17% as major events (57 days), and 46% as extreme events (154 days). The mean and sd of event duration were 11.6 ± 4.1 days, highlighting significant variability in event persistence. As clearly illustrated in
Figure 1, the M4 and M5 models (secondary group) demonstrate a systematic failure to capture SSW dynamics, identifying fewer than three events throughout the historical period. This deficiency is primarily attributed to their low-top configurations, which limit their ability to resolve vertical wave–mean flow coupling and stratospheric variability. Due to this limited capacity and the resulting insufficient number of events for robust statistical evaluation, M4 and M5 were excluded from the detailed analytical diagnostics. Consequently, the primary analysis focuses on the high-top models M1, M2, and M3, which better simulate the dynamical and thermodynamical processes of the stratosphere. The November–April window was selected for the primary analysis as it encompasses more than 90% of total SSW days in both ERA5 and the models, providing a statistically representative seasonal range.
Figure 2 shows the temporal and seasonal distribution of SSWs during the historical period (1980–2005) for ERA5 and models (M1–M3), while
Figure 3 shows the corresponding distributions under the RCP 4.5 scenario (2006–2100). In the analysis, M1, M2, and M3 models are selected for SSW analysis because of their better handling of the dynamical and thermodynamical processes that are provided below. This time frame was chosen to provide a consistent physical and statistical framework for evaluating how well the models capture mid-winter warming events.
The 62% of the SSWs based on ERA5 analysis occurred in January–February. Of these, 31% occurred in November–December and 7% in March–April. M1 closely reproduced ERA5’s pattern: corresponding values are found to be 58%, 34%, and 8%. M2 generated these corresponding values as 49%, 39%, and 12%. M3 produced 64% for January–February, 28% for November–December, and 8% for March–April, matching the observed seasonal phases most closely. Results suggested the mean event duration for M1, M2, and M3 were 9.2 ± 3.3, 8.7 ± 3.7, and 11.0 ± 3.9 days, respectively. Overall, compared to ERA5 analysis, it is found that M1 and M3 simulations reproduce the observed seasonality within ± 5% accuracy, whereas M2 exhibits a systematic bias toward short-lived and early winter warmings.
In summary, the comparison between historical (1980–2005) and future (2006–2100) periods under the RCP 4.5 scenario indicates that the frequency of SSW events remains relatively stable. No statistically significant increasing or decreasing trend was observed across the M1, M2, and M3 ensembles, suggesting that while the climate warms, the occurrence rate of these events does not deviate substantially from historical averages. This finding is in line with the statistical projections provided by Rao and Garfinkel (2021) [
25].
However, while the recent literature indicates that SSW frequency shows little robust change under future scenarios, our results provide evidence that future climate change is closely associated with significant systematic biases in the simulated physical and dynamical intensity of these events. These biases, characterized by a 61% to 82% underestimation of warming magnitude, demonstrate that the primary impact of climate change in models is the misrepresentation of the energetic structure of SSWs, rather than a shift in their occurrence rate.
Future projections under RCP 4.5 (
Figure 3) show how the simulated SSW seasonality evolves through the 21st century. Based on the same identification and classification criteria applied throughout history, it appears that the events are limited to the November–April period. M1 shows 53.4% of events in January–February months, 27.2% in November–December months, and 18.4% in March–April months, which is similar to the historical period, with a mean duration of 11 ± 5.1 days. M2 simulates 67.1% for January–February, 0% for November–December, and 32.9% for March–April, showing an early winter shift of approximately 10% and producing the shortest mean event duration of 10.7 ± 3.7 days. M3 maintains 65.8% for January–February, 26.1% for November–December, and 8.1% for March–April, with a mean event duration of 9.8 ± 3.4 days.
4.2. Temperature Time Series
In this section, Ts at 10 hPa 60° N latitude is plotted and interpreted using ERA5 reanalysis and CMIP5 models (M1–M3), along with historical period and RCP 4.5 projection time series.
4.2.1. Historical Period of Ts
Daily temperature series at 10 hPa at 60° N latitude for the period 1980–2005 are analyzed to assess how effectively three CMIP5 models (M1–M3) reproduce the observed stratospheric thermal variability (
Figure 4). Statistical assessments of T
s, sd, percentile ranges (P
r), and S
Ts are summarized in
Table 4.
According to ERA5 (
Figure 4a), the polar stratosphere exhibits a weak but persistent cooling trend throughout the historical period. However, it displays significant seasonal and interannual thermal variability. As shown in
Table 4, temperature variance decreases by approximately 46% during SSW events compared to non-SSW periods. This decrease confirms that temperature variability forms during midwinter warmings, indicating a significant thermal response.
Numerical simulations are applied using M1, M2, and M3. M1 (
Figure 4b) overestimates the thermal range, producing a higher sd with a wider P
r than ERA5. It also overestimates the long-term cooling trend by about 2.9 times. M2 (
Figure 4c) suppresses thermal variability with a narrower P
r. However, it overestimates the cooling rate with an inconsistent representation of stratospheric dynamics. M3 (
Figure 4d) results are more comparable with ERA5 results and closely reproduce the magnitude of the mean T
s, sd, and the long-term cooling trend (
Table 4).
In summary, all models successfully capture the thermodynamic signature of SSWs. The reduction in T
s and sd for all datasets ranges from 44% to 54% (
Table 4). This confirms that SSW events play an important role in deriving polar cooling processes.
4.2.2. RCP 4.5 Future Climate Scenarios of Ts
In this section, the simulations are analyzed to determine how RCP 4.5 forcing modifies the characteristics of SSWs that include T
s, sd, P
r, and S
Ts (
Table 5). As shown in
Table 5, long-term cooling trends remain weak and negative, suggesting that polar stratospheric cooling will continue moderately throughout the 21st century.
M1 (
Figure 5a) closely maintains its historical thermal structure but suggests significant stability in the long-term trend. The rate of cooling is weakened by about 76% compared to the historical baseline. In contrast, M2 (
Figure 5b) shows slightly more variability with a narrower P
r. Similar to M1, the cooling rate is 60% weaker than the historical prediction. The M3 (
Figure 5c) simulations for the climate scenarios are found to be similar to the historical ERA5 climate. The reduction in T
s variance during SSW events is estimated at about 49–52% for all models (
Table 5), and the metrics are found to be comparable to each other. This consistency reveals that the polar vortex continues to experience events of warming with a consistent thermal structure of the historical period.
4.3. Zonal Wind Time Series
In this section, Uh at 10 hPa 60° N latitude are plotted and interpreted using ERA5 reanalysis and CMIP5 models (M1–M3), along with historical period and RCP 4.5 projection time series. SSW characteristics will be discussed using Uh characteristics similar to the structure given for Ts in the previous section.
4.3.1. Historical Period of Uh
In this subsection, daily U
h time series at the 10 hPa level above 60° N latitude for the period 1980–2005 were analyzed to assess the effectiveness of CMIP5 models (M1–M3) to produce SSW dynamical variability and evaluate the polar night jet (
Figure 6). The assessment, using mean U
h, sd, S
Uh, and P
r representing dynamic amplitude, is presented in
Table 6.
According to ERA5 (
Figure 6a), polar stratospheric winds exhibit significant seasonal and interannual variability throughout the historical period, accompanied by significant slowing of westerly winds. The observed dynamics are a direct response to SSW activity. As shown in
Table 6, U
h variance during SSW events decreases by 24% compared to non-SSW periods.
When the M1 model is examined (
Figure 6b), the polar night jet’s intensity is overestimated, producing an amplified range of variability compared to ERA5 results. It also simulates a spurious strengthening of westerly U
h which is contrary to the observed deceleration (
Table 6). M2 (
Figure 6c) reproduces the dynamic amplitude and mean state reasonably well but also fails to capture the long-term trend direction as indicated by M1. M2 shows a systematic trend reversal, resulting in an artificial westerly U
h acceleration of the polar night jet. M3 (
Figure 6d) provides the results closest in agreement with ERA5 simulations, reproducing the magnitude and variability structure of the mean U
h with the highest success among all models (
Table 6). However, M3 shows a positive trend of U
h variation, but its value is weaker than that of M1 and M2.
Despite the inconsistencies in long-term trends, all three models capture the dynamic signature of SSWs. The reduction in U
h variance is found to be between 13 and 54% (
Table 6). This confirms that vortex breakdowns lead the stratosphere into a state of reduced dynamic variability. However, the magnitude of this clustering varies significantly depending on the model.
4.3.2. RCP 4.5 Future Climate Scenarios of Uh
This subsection presents future climate scenarios based on SSW characteristics. Under the RCP 4.5 scenario, daily U
h time series at 10 hPa and 60° N for the period 2006–2099 are analyzed to assess how the predicted circulation changes are comparable to historical polar night jet stream intensity (
Figure 7). The metrics of U
h, sd, P
r, and S
Uh are summarized in
Table 7.
In M1 simulations (
Figure 7a), the mean U
h shows increasing variability, with a slightly wider P
r (70.44 m.s
−1), at 8.5% compared to the historical average value. However, in the long-term trend, S
Uh weakens at about 82% compared to the historical period, becoming almost zero (
Table 7). In M2 simulations (
Figure 7b), the average U
h shows a modest 4.2% increase compared to its historical value, but the P
r range decreases. S
Uh shifts from a historically positive value to a weakly negative value (−0.004 m.s
−1.yr
−1). This represents a large relative decrease in the trend value that indicates the complete disappearance of the historical strengthening trend (
Table 7). In M3 simulations (
Figure 7c), an average circulation is simulated with a slightly narrow amplitude range, which remains unchanged from its historical value. The S
Uh trend weakens at about 48%, but it is not statistically significant (
Table 7). M3 continues to exhibit distinct behavior during extreme events, such as winds, during which SSW days increase significantly when they are compared to the historical baseline value during SSW days.
Overall, regarding the dynamic response to SSWs, variability continues to narrow across the models compared to non-SSW conditions. As shown in
Table 7, the variance reduction ranges from 4% to 43%, indicating that mid-winter warming remains abrupt and dynamically consistent under moderate future forcing. Under RCP 4.5 scenarios, each model preserves the underlying U
h climatology while clearly reducing the historical strengthening trends of the polar night jet. Specifically, none of the RCP 4.5 slopes deviates significantly from zero, implying a statistically stationary mean flow behavior throughout the 21st century. M3 provides the most consistent statistical representation of polar night jet variability as suggested by ERA5, while M1 provides a stronger mean flow. On the other hand, M2 provides weaker but more stable circulation.
4.4. Stratospheric Temperature Amplitude and Range
As justified in
Section 3, while models M1, M4, and M5 were utilized for general frequency comparisons, the following high-fidelity assessments of stratospheric amplitude and thermal–dynamical coupling (
Section 4.4,
Section 4.5 and
Section 4.6) focus exclusively on models M2 and M3 to ensure physical consistency in capturing the magnitude of stratospheric variability. The T
amp and T
rng are evaluated to highlight the SSW-related thermal variability based on daily temperature values shown in
Figure 4 and
Figure 5, respectively.
Table 8 summarizes the mean values of T
amp and T
rng, their sd, and S
Ts for the ERA5 reanalysis and CMIP5 models (M2 and M3) at 10° N and 60° N during the historical and future periods. The daily values of the above parameters are averaged over the targeted years to obtain monthly averages.
In the tropical stratosphere (10° N), ERA5 analysis effectively characterizes thermal variability as stable, with low sd and a negligible long-term of T
s; however, CMIP5 models exhibit deviations. M2 underestimates T
rng by approximately 21% (
Table 8). M3, contrary to previous evaluations, also underestimates these metrics, capturing about 91–92% of the observed variability, showing a much closer agreement with ERA5 than M2 but with a slight negative bias.
In the polar stratosphere, the observed T
amp variability at 60° N increases significantly, enhancing wave activity and polar vortex dynamics associated with SSWs. ERA5 exhibits the strongest variability at T
rng of 0.730 K and T
amp of 3.159 K. However, all models struggle to reproduce this polar T
amp magnitude.
Table 8 shows that M2 captures only ~20% of the observed T
rng and underestimates T
amp by ~81%. This suggests that polar T
amp variability is reduced significantly. M3, similarly to M2, struggles to reproduce the polar magnitude, resolving only ~26% of T
rng and underestimating T
amp by ~75%. Based on the long-term evolution of T
amp, linear trends remain small for all data, reflecting model-dependent suppression of the processes rather than the real climate change signal. Under the RCP 4.5 future climate scenario, the T
amp variability structure is usually conserved for all latitudes. Deviations in projected T
amp and T
rng are found to be minimal for both models, varying by less than 4% relative to historical baselines. Furthermore, future linear trends with S
Ts < |0.001| are found to be negligible.
In summary, ERA5 exhibits the strongest T
s variability across all latitudes (
Table 8). In contrast, M2 systematically underestimates both T
amp and T
rng at about 20% and 80%, respectively, whereas M3 underrepresents polar amplitude by approximately 50%. Long-term S
Ts values are found to be insignificant, indicating that there is no significant change in the SSW occurrence conditions. These finding suggest that while the tropical–polar contrast is preserved under RCP 4.5, CMIP5 models’ simulations cannot produce the magnitude of T
s variability, indicating limitations of model vertical resolution and wave–mean-flow coupling.
4.5. Zonal Wind Amplitude and Range
The U
amp and U
rng variables for U
h are analyzed at 10° N and 60° N latitudes at 10 hPa to evaluate the dynamical variability associated with SSW events, based on the daily evolution shown in
Figure 6 and
Figure 7. This comparison between tropical and polar dynamics provides a quantitative basis for evaluating the QBO–SSW coupling.
Table 9 provides statistical measures, including the mean, median, sd, U
rng, U
amp, and long-term tendencies for the ERA5 reanalysis and CMIP5 model simulations (M2 and M3) during the historical and future periods.
In the tropical stratosphere (10° N), which acts as a proxy for the tropical QBO signal, ERA5 show an average U
rng of 3.021 m.s
−1 with a weak positive multi-decadal S
Uh. The models show contrasting abilities in reproducing these dynamics (
Table 9). M2 follows up the ERA5 trend closely, with a ~2% and 8% difference in U
rng and U
amp, respectively. In contrast to previous expectations, M3 also aligns reasonably well with ERA5 in the tropics, showing only a slight overestimation of ~1.5% in U
rng and ~5.6% in U
amp. M2 has stable long-term trends found in ERA5 analysis, while M3 exhibits a clear increase in S
Uh which is consistent with its pronounced westerly bias.
In the polar stratosphere (60° N), ERA5 analysis shows a stronger dynamic variability, and that reflects the active wave–mean flow interactions related to mid-latitude westerly winds. Neither the M2 nor the M3 models can capture the full magnitude of this polar variability (
Table 9). M2 simulations underestimate the U
rng value at about 60% and captures only 36% of the observed U
amp, and this indicates a significant suppression of polar jet dynamics. The discrepancy between the relatively lower bias in tropical amplitudes and the significant underestimation at 60° N suggests that the model deficiency is primarily associated with the representation of dynamic teleconnections linking tropical forcing to the polar stratosphere. M3 simulation performs similarly to M2 in this updated analysis, capturing only ~39% and ~37% of U
rng and U
amp, respectively. ERA5 shows a slight strengthening of S
Uh (positive) in polar westerly U
h. Both M2 and M3 models simulate negative trends at −0.0018 and −0.0080 m.s
−1 decade
−1), respectively, but fail to reproduce the observed strengthening of the polar U
h.
Under the RCP 4.5 future climate scenarios, the dynamic structure, represented with U
h metrics, remains generally stable across all latitudes. The changes in U
amp and U
rng (
Table 9) are less than 4% of the historical baselines. In the tropics, M2 shows a slight increase in variability (~1%) but M3 shows a small decrease (~6%). At 60° N, both models retain their historical biases while M2 continues to significantly underestimate U
amp (~4.1 m.s
−1) compared to ERA5 (~11.4 m.s
−1). The model-based projected long-term trends become statistically insignificant because S
Uh becomes close to zero. This suggests a lack of robust dynamic evolution in U
h variability throughout the 21st century.
In summary, ERA5 analysis exhibits a stronger dynamic variability compared to CMIP5 models, particularly in the polar region (
Table 9). M2 systematically underestimates both U
rng and U
amp by at about 40–70%, but M3 overestimates the tropical variability of U
amp while underrepresenting polar U
amp by about 25%. Discrepancies in the sign and magnitude of long-term U
amp trends indicate that the models inadequately capture the dynamic response of the polar night jet to interannual dynamical forcing. These persistent deviations in both historical and future simulations likely reflect structural limitations in vertical resolution as well as wave-flow coupling conditions, but this needs to be verified in future studies. Overall, these results imply that while models generate tropical forcing, the primary systematic error lies in the dynamic coupling necessary to propagate these signals poleward.
4.6. Thermal–Dynamical Coupling Analysis
To investigate the thermal–dynamical coupling in the stratosphere, scatter plots between daily mean T
s and U
h at 10 hPa are examined at 60° N latitude for both historical and future periods.
Figure 8 corresponds to the historical period, and
Figure 9 depicts projections under the RCP 4.5 scenario.
At 60° N latitude, ERA5 analysis (
Figure 8a) shows that U
h increases with decreasing T
s, suggesting that warmer T
s is associated with weaker westerlies (or easterlies), whereas decreasing T
s is related to stronger westerlies. Both fits have similar slopes, separated at lower temperatures. The fit for the pre-1990 data had a slope of −1.62 m.s
−1 K
−1, r = −0.59; R
2 = 0.35. After 1990, the slope weakens to −1.13 m.s
−1 K
−1 (r = −0.49; R
2 = 0.24), corresponding to a roughly 30% reduction in magnitude. Overall, ERA5 shows a persistent inverse coupling that weakens in the later decades.
At 60° N latitude, M2 (
Figure 8b) exhibits a much stronger and more linear inverse T
s–U
h relationship than ERA5. Both the pre- and post-1990 subsets yield regression slopes close to −2.0 m.s
−1 K
−1 (r ≈ −0.82; R
2 ≈ 0.65–0.68), with only minor differences between the two periods, indicating that the model maintains a persistently intense anticorrelation rather than the weakening seen in ERA5. The scatter of data points is tightly clustered along the regression axis, especially in the 220–235 K T
s range, where many SSW days occur. In addition, M2 produces a substantial number of cases with U
h < −20 m.s
−1, which are rare in ERA5, suggesting that the model overproduces very weak or reversed vortex states and exaggerates the dynamical response to warming anomalies.
In M3 simulations (
Figure 8c), the overall T
s–U
h coupling is found to be negative with a weaker coherence than that of both ERA5 and M2. Regression slopes remain almost unchanged between the two periods (−1.40 to −1.42 m.s
−1 K
−1), implying a stable yet moderately underestimated inverse coupling with no clear post-1990 weakening. The scatter shows a broader spread in U
h (approximately −10 to 40 m.s
−1), indicating that M3 simulates a wider range of vortex states but with reduced sensitivity of U
h anomalies to T
s variations. Clustering in the 220–235 K band is less pronounced than in ERA5, and the model underestimates the amplitude of the thermal–dynamical link at about 15–20%.
For the RCP 4.5 projections, results of T
s–U
h coupling related to four successive time segments (i) 2006–2025, (ii) 2026–2050, (iii) 2051–2075, and (iv) 2076–2100 are shown using distinct color coding, with solid trend lines representing the least-squares fit for each period (
Figure 9). This consistent visualization enables a direct comparison of reanalysis-based and model-simulated coupling characteristics across historical and future conditions. Results for each time period are provided below.
For time periods i, ii, and iv, M2 exhibits a clear inverse T
s–U
h relationship, with a slope of −1.84 m.s
−1.K
−1 (r = −0.42; R
2 = 0.18) (
Figure 9a). The mean T
s is 224.8 ± 6.3 K and mean U
h is 17.4 ± 19.2 m.s
−1, indicating a conical warm–weak jet pattern. The correlation shows that the slope becomes steeper to −2.65 m.s
−1.K
−1 (r = −0.56; R
2 = 0.31) between 2026–2050, representing the period when the vortex is most susceptible to thermal anomalies. In 2051–2075, the relationship weakens substantially, with the slope flattening to −1.32 m.s
−1 K
−1 (r = −0.27; R
2 = 0.07), and by 2076–2100, it decreases further to −0.74 m.s
−1 K
−1 (r = −0.17; R
2 = 0.03). Although the average T
s rose by about 1 K over the time period, the U
h pattern persisted with a predominantly westward orientation. Overall, M2 maintained the correct sign of the coupling. However, towards the end of the century, it appears to indicate a gradual decrease in dynamic strength and consistency.
For time periods ii, iii, and iv, M3 shows a moderate inverse coupling with a slope of −1.21 m.s
−1 K
−1 (r = −0.36; R
2 = 0.13), mean values of T
s = 225.1 ± 6.5 K, and U
h = 20.6 ± 18.7 m.s
−1 (
Figure 9b). The coupling strengthens during 2026–2050, as the slope steepens to −2.08 m.s
−1 K
−1 (r = −0.47; R
2 = 0.22) and reaches maximum intensity in 2051–2075 (slope = −2.42 m.s
−1 K
−1; r = −0.52; R
2 = 0.27), reflecting the period of strongest polarity between warming anomalies and response. In the final segment (2076–2100), the slope weakens sharply to −0.63 m.s
−1 K
−1 (r = −0.19; R
2 = 0.04), accompanied by increased scatter and higher mean T
s (~227 K). Thus, M3 shows a transient mid-century strengthening followed by a late century weakening, indicating a non-monotonic dynamical response to RCP 4.5 forcing.
Collectively,
Figure 8 and
Figure 9 demonstrate a robust and persistent inverse coupling between T
s and U
h in the polar stratosphere at 60° N across both the historical and RCP 4.5 periods. In historical records, ERA5 exhibits strong negative slopes and high correlation magnitudes, with a clear weakening after 1990, indicating reduced dynamical coherence of the polar vortex in recent decades. Both CMIP5 M2 and M3 models reproduce the correct sign of the coupling, albeit with distinct biases: M2 consistently amplifies the slope and correlation strength relative to ERA5, whereas M3 captures the overall structure but underestimates the dynamical amplitude.
5. Discussion
This study investigated the relationship between SSWs and QBOs using ERA5 reanalysis and CMIP5 model simulations, both in historical and future periods. Comparison of SSW frequency and intensity between the ERA5 and CMIP5 models revealed important differences, and here they are briefly discussed.
In the RCP 4.5 scenario, M1 and M3 maintained almost constant event frequencies (±5%) but M2 showed a tendency to occur in the early winter months with a decrease of approximately 10%. These differences are consistent with the outputs of CMIP5 models, but vertical resolution and planetary wave propagation characteristics directly affect eddy distortion frequency and intensity. In this respect, CMIP5 simulations need to focus more on turbulent heat fluxes in high resolution mode. Note that both M4 and M5 are not analyzed due to their inability to generate realistic SSWs (# < 3 events).
Analysis of T
s and U
h variability revealed structural biases in how the models preserved the polar vortex rather than the use of event frequency. Historically, ERA5 analysis exhibits a sustained cooling trend with a westward slowing trend. In contrast, M1 misleadingly simulates a strengthening of the polar night jet during which M2 suppresses dynamic variability, systematically reducing T
amp and U
rng magnitudes in the polar region at about 60−80%. M3 more accurately captures the variability in the structure of system dynamics and thermodynamics, reproducing approximately 75% of the observed amplitude of T
amp. However, M2 significantly overestimates tropical variability. Furthermore, the reduction in variance during SSW events acts as a key indicator of vortex breaking across the models (~24−54%). This confirms that the thermodynamic response of the SSW event remains robust despite mean-state deviations. These shortcomings in reproducing the magnitude of polar variability point to limitations of the models representing intrinsic dynamic forcing. These persistent biases reflect broader challenges identified in both CMIP5 and CMIP6 ensembles regarding stationary wave driving [
34,
35]. While high-top configurations in newer model generations show improvements in stratospheric dynamics [
43], low-top models like M4 and M5 remain essential for demonstrating the structural sensitivity of the polar vortex to model vertical extent.
The QBO processes are analyzed in determining the frequency and intensity of SSWs due to their effect on planetary wave propagation and background horizontal wind shear. Both QBO− and QBO+ phases affect the upward transmission of Rossby waves into the polar vortex, altering the vertical refractive index of the tropical stratosphere [
4,
44]. Previous studies using reanalysis have shown that SSWs occur twice as often in QBO− phases when compared to QBO+ phases [
10,
15,
45]. This reflects that when upward wave flow increases, the polar vortex is further disrupted. In the CMIP5 models, QBO dynamics are not explicitly resolved because tropical momentum forcing is parameterized rather than dynamically generated. Of the three selected models, only M2 and M3 exhibit intermittent equatorial wind direction changes resembling semi-biennial behavior. However, these oscillations are weak and irregular. M1 shows a limited semi-annual oscillation in the lower stratosphere. This is a known limitation of the upper–lower CMIP5 configurations [
36]. Because this study focuses on only the 10 hPa level, direct QBO–SSW coupling cannot be retrieved because the QBO is not dynamically expressed. Therefore, the interpretation relies on established dynamical pathways and previous intercomparisons.
M2 predicts fewer SSW days and a narrower Urng, while M3 shows a modest upward trend with slightly stronger event classification. The contrast between the monotonic attenuation in M2 and the non-monotonic transient peak in M3 highlights significant structural ambiguities in the representation of future wave–mean flow feedback under radiative forcing. Phase-space analyses of Ts–Uh relationships further emphasize these contrasts, showing a weakening of the post-1990 inverse correlation in ERA5 (r ≈ −0.8). This is partly captured by M2 simulations but remains inconsistent with M3. This inconsistency suggests that CMIP5 models may have overestimated the dynamic stability of the polar vortex, as they generally support a strong linear response, in contrast to the recent ‘decoupling’ seen in the reanalysis data.
The representation of stratospheric variability in CMIP5 models remains persistent, and significant differences occur compared to the ERA5 reanalysis. M1 underrepresents strong SSW events; M2 underestimates overall both dynamical and physical variabilities. On the other hand, M3 reproduces the observed temporal structure of both U
h and T
s metrics. However, specifically in the polar region, it significantly underestimates the magnitudes of both T
amp and U
amp, similarly to M2. The systematic underestimation of SSW intensity in models has direct implications for projected surface weather impacts. Since the strength of the downward coupling is physically linked to the magnitude of the stratospheric warming, an 80% bias in temperature anomalies suggests that models may fail to capture the full severity of mid-latitude cold spells and storm track shifts that typically follow these events. Therefore, although the frequency of SSWs is projected to be stable, their simulated impact on surface weather conditions is likely underestimated due to these persistent dynamical biases. Additionally, the energetic influence of SSWs extends into the upper atmosphere, potentially modulating broader oscillations such as the Semi-Annual Oscillation (SAO) [
46]. ERA5 therefore provides a consistent observational benchmark for assessing these differences in both historical and future contexts but better observations covering the upper atmosphere based on in situ measurements, and satellite data can improve the outcome of this work. These results suggest a need to improve the model physics of QBO–SSW interaction, planetary wave propagation dynamics, and mean flow interactions, to increase the reliability of future stratospheric climate projections.