Evaluation of the Antarctic Circumpolar Wave Simulated by CMIP5 and CMIP6 Models

As a coupled large-scale oceanic and atmospheric pattern in the Southern Ocean, the Antarctic circumpolar wave (ACW) has substantial impacts on the global climate. In this study, using the European Centre for Medium-Range Weather Forecasts ERA5 dataset and historical experiment outputs from 24 models of the Coupled Model Intercomparison Project Phase 5 and Phase 6 (CMIP5/CMIP6) spanning the 1980s and 1990s, the simulation capability of models for sea-level pressure (SLP) and sea surface temperature (SST) variability of the ACW is evaluated. It is shown that most models can capture well the 50-month period of the ACW. However, many simulations show a weak amplitude, but with various phase differences. Selected models can simulate SLP better than SST, and CMIP6 models generally perform better than the CMIP5 models. The best model for SLP simulation is the CanESM5 model from CMIP6, whereas the best model for SST simulation is the ACCESS1.3 model from CMIP5. It seems that the SST simulation benefits from the inclusion of both a carbon cycle process and a chemistry module, while the SLP simulation benefits from only the chemistry module. When both SLP and SST are taken into consideration, the CanESM5 model performs the best among all the selected models.


Introduction
The Antarctic circumpolar wave (ACW) is a large-scale air-sea interactive phenomenon in the Southern Hemisphere, first named by White and Peterson in 1996 [1]. It manifests as an eastward-propagating signal around the Antarctic in sea-level pressure (SLP) [1][2][3][4], sea surface temperature (SST) [1][2][3][4][5][6][7], sea-ice extent/sea-ice concentration (SIE/SIC) [1,8,9], sea surface height (SSH) [5,10], meridional wind speed (MWS) [1,6,11,12], wind stress [5,10], surface air temperature (SAT) [9,11,13] and salinity [3,11,13], with a mean speed of 6-8 cm s −1 and a period of 4-5 years. As a major component of the Southern Ocean (SO), the ACW plays an important role in the Southern Hemisphere and global climate change and has drawn wide interest around the world [14]. In the past two decades, many researchers have investigated its origin, maintenance mechanism, development, variation and impact on both regional and global climate via statistical diagnostics and numerical modeling. Therefore, various hypotheses and theoretical models have been developed. Qiu and Jin [15] used a coupled model to show that the instability of the coupled ocean-atmosphere interaction is the main source of the ACW. Nevertheless, the ACW in SST anomalies is principally modulated by the El Niño-Southern Oscillation (ENSO) teleconnections [11,16]. It has been suggested that ENSO is

Taylor Analysis
The Taylor analysis is widely used to evaluate model simulations against reference data. It draws a diagram that can provide a concise statistical summary of how well patterns match each other in terms of their correlations, their root-mean-square differences and the ratios of their variances [37]. Then we can calculate the skill score S of each model as: where R is the pattern correlation between the reference and simulation, σ is the simulated standard deviation divided by that of the reference (also normalized standardized deviation) and R 0 represents the maximum of probable correlation coefficient (here set as 1). S ranges from 0 to 1. The closer S is to 1, the better simulation skill the model has.

ACW Extraction
There are several definitions of the ACW, and here we followed in the steps of White and Peterson [1]. First, the monthly anomalies were calculated to remove the seasonal cycle, then a bandpass filter of 3-7 years was used to remove the fast change and long-term signals. To avoid the boundary effect, the data from 4 year before and after the study period were omitted. A small difference is that we used the Butterworth filter with a flat amplitude response function, which is commonly used in ACW studies [16,31,38,39]. The empirical orthogonal function (EOF) was applied to filter monthly anomalies, and the first mode (EOF1) represents the ACW signal [16,25]. Moreover, a spectrum analysis was carried out to obtain a clear period.
In the following section, results from the reference data, the model with the highest skill score, the unweighted average of the CMIP5 and CMIP6 model ensembles and the models in CMIP5 and CMIP6 with the highest correlation with the reference data are shown.

Taylor Analysis of Model Simulations
The Taylor analysis was applied to pick up well simulated SLP and SST for further evaluation. Figure 1a shows the results for the SLP. Twelve CMIP5 models had normalized and standardized deviation ranging from 0.9 to 1.1 except for the IPSL-CM5A-LR and BCC-CSM1.1 model. The BCC-CSM1.1 model simulated a weaker-centered SLP variance with a normalized and standardized deviation less than 0.9, while the IPSL-CM5A-LR model simulated a stronger-centered SLP variance with normalized and standardized deviation near 1.2. Correlation coefficients between all the models and reference data were larger than 0.3. The CESM1-CAM5 model had the highest S score while the GISS-E2-H model had the lowest. Earth system models show no obvious advantage over physical climate system models ( Figure 2a). For the 12 CMIP6 models, normalized and standardized deviation ranged from 0.92 to 1.12, except for the CESM2 model, which was slightly larger. All CMIP6 models' correlation coefficients were larger than 0.3. The FIO-ESM-2-0 model had the highest S score while the CNRM-CM6-1 model had the lowest. Models with an atmospheric chemistry module had a higher S score than the other models, with the exception of the CNRM-CM6-1 model (Figure 2a). Compared with the CMIP6 models, CMIP5 models generally had poorer correlation coefficients and larger variances between models. The CMIP6 model ensemble was superior to the CMIP5 model ensemble, but not at all times. The FIO-ESM-2-0 model had the highest S score among all 24 models, and thus was considered the best in simulating the reference SLP.
Same as Figure 1a but for SST, Figure 1b shows that 5 out of the 12 CMIP5 models had normalized and standardized deviations of less than 1, meaning that these model-simulated centered SST variances were weaker than the reference data. All CMIP5 models had correlation coefficients over 0.75, among which the FIO-ESM model had the highest, over 0.9. Moreover, the FIO-ESM model had the highest S score while the GFDL-CM3 model had the lowest, and Earth system models seemed to surpass physical climate system models ( Figure 2b). Among the 12 CMIP6 models, the CNRM-CM6-1, GISS-E2-1-H and IPSL-CM6A-LR models simulated stronger-centered SST variances than the reference data. In contrast, those in other models were weaker, with the weakest being the CESM2 model with a normalized and standardized deviation of less than 0.75. Correlation coefficients of four models (GFDL-CM4, CanESM5, GFDL-ESM4 and HadGEM3-GC31-LL) exceeded 0.9, and those of all 12 models were over 0.76. The GFDL-CM4 model had the highest S score while the ACCESS-CM2 model had the lowest. Except for the CESM2 model, all models with a chemistry module performed better than those without ( Figure 2b). Besides lower normalized standardized deviations, the CMIP6 model ensemble  Same as Figure 1a but for SST, Figure 1b shows that 5 out of the 12 CMIP5 models had normalized and standardized deviations of less than 1, meaning that these model-simulated centered SST variances were weaker than the reference data. All CMIP5 models had correlation coefficients over 0.75, among which the FIO-ESM model had the highest, over 0.9. Moreover, the FIO-ESM model had the highest score while the GFDL-CM3 model had the lowest, and Earth system models seemed to surpass physical climate system models ( Figure 2b). Among the 12 CMIP6 models, the CNRM-CM6-1, GISS-E2-1-H and IPSL-CM6A-LR models simulated stronger-centered SST variances than the reference data. In contrast, those in other models were weaker, with the weakest being the Taylor diagrams of (a) sea-level pressure (SLP) and (b) sea surface temperature (SST) simulated by the 12 CMIP5 models and 12 CMIP6 models with ERA5 as the reference data.
Atmosphere 2020, 11, x FOR PEER REVIEW 5 of 15 Atmosphere 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/atmosphere  Same as Figure 1a but for SST, Figure 1b shows that 5 out of the 12 CMIP5 models had normalized and standardized deviations of less than 1, meaning that these model-simulated centered SST variances were weaker than the reference data. All CMIP5 models had correlation coefficients over 0.75, among which the FIO-ESM model had the highest, over 0.9. Moreover, the FIO-ESM model had the highest score while the GFDL-CM3 model had the lowest, and Earth system models seemed to surpass physical climate system models ( Figure 2b). Among the 12 CMIP6 models, the CNRM-CM6-1, GISS-E2-1-H and IPSL-CM6A-LR models simulated stronger-centered SST variances than the reference data. In contrast, those in other models were weaker, with the weakest being the In general, both the CMIP5 and CMIP6 models can better simulate SST than SLP. The CMIP6 model ensemble has little advantage over the CMIP5 model ensemble. A chemistry module seems to improve the performance of both SLP and SST simulations. In contrast, the carbon cycle process brings benefits only to the SST simulation.

Evaluation of Model-Simulated SLP
The time-longitude figure shows the evolution of the filtered SLP signal from the reference data (Figure 3a), the best simulation from the FIO-ESM-2-0 model selected via the Taylor analysis (Figure 3b), the unweighted average of both CMIP5 ( Figure 3c) and CMIP6 model ensembles (Figure 3e), the CESM1-CAM5 model simulation selected from CMIP5 ( Figure 3d) and the CanESM5 model simulation selected from CMIP6 ( Figure 3f). There was an obvious annual oscillation at a particular longitude, and a peak to valley pattern appeared in parallel at the same time. This ACW-like pattern shows a dominant eastward propagation, which was most intense and clear in the Pacific sector.
The pattern had two wavelengths that encircled the globe with an average period of 6-8 years, which is consistent with previous studies [1,28]. Compared with White and Peterson [1], our result had a 15% (0.5 hPa) weaker amplitude of fluctuation. When the result sensitivities to different data and filters were taken into consideration, it is reasonable to conclude that these reference data could well reproduce the ACW during our study period. The selected CESM1-CAM5, FIO-ESM-2-0 and CanESM5 models basically captured the wave pattern and propagating trend when compared with the reference result. In contrast, the CMIP5 and CMIP6 model ensembles barely show a propagating wave-like pattern. The neutralization of different simulated ACW signals may be responsible for this bad result obtained via the unweighted mean. Among the CMIP5 models, Earth system models show no notable difference from the physical climate models. However, there were several models with poor performance, such as the BCC-CSM1.1 model, which reproduced a random meridional distribution and barely half of the peak value. Although it shows a small eastward-propagating trend, it is among the worst simulations when the vague wave pattern was considered. Moreover, instead of a propagating pattern, the FGOALS-g2 result rather shows a zonally uniform annual oscillation, which greatly differed from the ACW. For CMIP6 models, the CESM2 and HadGEM3-GC31-LL models could not reproduce the wave pattern or propagation trend of the ACW (figure omitted).
It is interesting to notice an abnormal westward propagation between 0 and 90 • W (South Atlantic, including the Drake Passage region) in many models, especially for physical climate models, which cannot be seen in reference data. According to White and Peterson [1], the ACW SST anomalies originate in the western subtropical South Pacific and propagate eastward, which is mainly due to the Antarctic circumpolar current (ACC). The ACC in the Drake Passage region is extremely strong with a velocity of 3-15 cm s −1 that would affect the ACW signal propagation in model simulations [27] and may block the eastward propagation in a specific region [13]. Moreover, the ACW signal tends to be much weaker in the South Atlantic sector [31]. Therefore, there may be large uncertainties for models in simulating the ACC or the mechanisms of how the ACC influences the ACW, making the simulated westward propagation trend rather dubious [40].  It is interesting to notice an abnormal westward propagation between 0 and 90 °W (South Atlantic, including the Drake Passage region) in many models, especially for physical climate models, which cannot be seen in reference data. According to White and Peterson [1], the ACW SST anomalies originate in the western subtropical South Pacific and propagate eastward, which is mainly due to the Antarctic circumpolar current (ACC). The ACC in the Drake Passage region is extremely strong with a velocity of 3-15 cm s −1 that would affect the ACW signal propagation in model simulations [27] and may block the eastward propagation in a specific region [13]. Moreover, the ACW signal tends to be much weaker in the South Atlantic sector [31]. Therefore, there may be large uncertainties for models in simulating the ACC or the mechanisms of how the ACC influences the ACW, making the simulated westward propagation trend rather dubious [40].
To further investigate the ability of models in simulating the SLP signal, an EOF analysis was applied to the filtered series mentioned above. Figure 4 shows the first EOF mode representing the ACW. As the main mode, the EOF1 of the reference data explained 41.9% of the total variance and took the general form of three complete wavelengths in time series (Figure 4a). This is consistent with the result of Bian and Lin [25], in which the EOF1 explains 33% of the total variance. The CESM1-CAM5 model from CMIP5 explained a little less than the reference data (Figure 4e), while for other models, such as the FIO-ESM model, the explanation rates exceeded 75% (figure omitted). All models had clear fluctuations but varied considerably in wavenumbers and phases (Figure 4d). To further investigate the ability of models in simulating the SLP signal, an EOF analysis was applied to the filtered series mentioned above. Figure 4 shows the first EOF mode representing the ACW. As the main mode, the EOF1 of the reference data explained 41.9% of the total variance and took the general form of three complete wavelengths in time series (Figure 4a). This is consistent with the result of Bian and Lin [25], in which the EOF1 explains 33% of the total variance. The CESM1-CAM5 model from CMIP5 explained a little less than the reference data (Figure 4e), while for other models, such as the FIO-ESM model, the explanation rates exceeded 75% (figure omitted). All models had clear fluctuations but varied considerably in wavenumbers and phases (Figure 4d). Furthermore, by calculating the correlation coefficients between the time series of the reference data and model simulations (results omitted), four models-ACCESS1.3, CESM1-CAM5, BCC-CSM1.1 and FIO-ESM-were found to be significantly positively related to the reference data. The CESM1-CAM5 model had the largest correlation coefficient of over 0.85. However, it failed to reproduce the first ACW in the early 1980s. Among the CMIP6 models, CanESM5's EOF1 explanation rate reached 65% (Figure 4g) and had a positive correlation coefficient of 0.99. Models with an atmospheric chemistry module seem to have a higher positive correlation than those without, although they show no advantage in terms of explaining the total variance. Compared with CMIP5 models, CMIP6 models show no improvement in terms of explained variance but had better performance in fluctuation manifestation.
Atmosphere 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/atmosphere ACW in the early 1980s. Among the CMIP6 models, CanESM5's EOF1 explanation rate reached 65% (Figure 4g) and had a positive correlation coefficient of 0.99. Models with an atmospheric chemistry module seem to have a higher positive correlation than those without, although they show no advantage in terms of explaining the total variance. Compared with CMIP5 models, CMIP6 models show no improvement in terms of explained variance but had better performance in fluctuation manifestation.  The spectrum analysis of the EOF1 time series was employed to extract the clear period of the ACW ( Figure 5). Seven out of 12 CMIP5 models show a nearly 50-month period of the ACW, similar to the reference data (passing the 95% red noise confidence level). Among the other five models, the BCC-CSM1.1, GFDL-ESM2G and IPSL-CM5A-LR models had a major period of 40 months while the CNRM-CM5 and GFDL-CM3 models contained two signals of both 30-month and 50-month periods. Except a 40-month period for the FGOALS-g3 and GFDL-CM4 models, all CMIP6 models show a 50-month period signal as well. Compared with the CMIP5 models, the CMIP6 models reproduce the ACW period better, but the carbon cycle process and chemistry module make no contribution to the better period performance.
Atmosphere 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/atmosphere BCC-CSM1.1, GFDL-ESM2G and IPSL-CM5A-LR models had a major period of 40 months while the CNRM-CM5 and GFDL-CM3 models contained two signals of both 30-month and 50-month periods. Except a 40-month period for the FGOALS-g3 and GFDL-CM4 models, all CMIP6 models show a 50-month period signal as well. Compared with the CMIP5 models, the CMIP6 models reproduce the ACW period better, but the carbon cycle process and chemistry module make no contribution to the better period performance. Figure 5. Spectra for bandpass-filtered first EOF time series of SLP from the ERA5 dataset (a) and selected models (red line is the red noise, and dashed line is the 5% and 95% confidence bounds). Models (b-f) are the same as Figure 3 (b-f).
Hence, we concluded that three models could simulate the ACW SLP signal well: the FIO-ESM model from CMIP5, and the CanESM5 and GFDL-ESM4 models from CMIP6. Figure 3, Figure 6 shows the evolution features of the spatiotemporal variation of SST from the reference data (Figure 6a), the best-performing model, GFDL-CM4, selected according to Taylor analysis (Figure 6b), the unweighted average of the CMIP5 model ensemble (Figure 6c) and the Hence, we concluded that three models could simulate the ACW SLP signal well: the FIO-ESM model from CMIP5, and the CanESM5 and GFDL-ESM4 models from CMIP6. Figure 3, Figure 6 shows the evolution features of the spatiotemporal variation of SST from the reference data (Figure 6a), the best-performing model, GFDL-CM4, selected according to Taylor analysis (Figure 6b), the unweighted average of the CMIP5 model ensemble (Figure 6c) and the CMIP6 model ensemble (Figure 6e), the ACCESS-1.3 model simulation selected form CMIP5 (Figure 6d) and the CESM2 model simulation selected from CMIP6 (Figure 6f). The reference data result was similar to the SLP features, which shows an obvious annual oscillation at a particular longitude and a peak to valley pattern in parallel at the same time. The wave pattern propagated eastward and took about 8 year to encircle the Antarctic. Furthermore, it reconfirmed the findings that the ACW is the strongest in the South Pacific sector in previous studies [11,38,41] and provided confidence regarding the ability of the reference data in capturing real ACW features. Unsatisfactory simulations, such as the BCC-CSM1.1 model, had a small eastward-propagating trend but a fragmentary wave pattern, and the GISS-E2-H model was 50% less than the reference data in amplitude. The ACCESS1.3 model represents the SST signal well in the eastern hemisphere but blurred the propagation trend in the western hemisphere. It took the CESM1-CAM5 model 6 year to encircle the Antarctic, while it took 4 year for the CanESM2 model. In contrast, the CMIP6 models were much improved in representing the wave pattern and propagation trend. The best GFDL-CM4 model result was similar to the reference data with a two-wavelength parallel form and 8 year encircling period. The CESM2 model performed well in the western hemisphere but had a small amplitude in the eastern hemisphere, therefore blurring the ACW. Other models had various problems, such as having a one-third loss in amplitude for the HadGEM2-CC model and a quasi-biennial oscillation instead of the ACW for the BCC-CSM2-MR model (figure omitted).

Same as
Atmosphere 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/atmosphere similar to the SLP features, which shows an obvious annual oscillation at a particular longitude and a peak to valley pattern in parallel at the same time. The wave pattern propagated eastward and took about 8 year to encircle the Antarctic. Furthermore, it reconfirmed the findings that the ACW is the strongest in the South Pacific sector in previous studies [11,38,41] and provided confidence regarding the ability of the reference data in capturing real ACW features. Unsatisfactory simulations, such as the BCC-CSM1.1 model, had a small eastward-propagating trend but a fragmentary wave pattern, and the GISS-E2-H model was 50% less than the reference data in amplitude. The ACCESS1.3 model represents the SST signal well in the eastern hemisphere but blurred the propagation trend in the western hemisphere . It took the CESM1-CAM5 model 6 year to encircle the Antarctic, while it took 4 year for the CanESM2 model. In contrast, the CMIP6 models were much improved in representing the wave pattern and propagation trend. The best GFDL-CM4 model result was similar to the reference data with a two-wavelength parallel form and 8 year encircling period. The CESM2 model performed well in the western hemisphere but had a small amplitude in the eastern hemisphere, therefore blurring the ACW. Other models had various problems, such as having a one-third loss in amplitude for the HadGEM2-CC model and a quasi-biennial oscillation instead of the ACW for the BCC-CSM2-MR model (figure omitted).  By applying the EOF analysis to filtered monthly SST anomalies, EOF1 was still taken as the ACW and its time series is shown in Figure 7. The reference data result shows three complete waves and explained over half of the total variance. It exceeded the SLP result, which indicates that there were too many short-time signals in the atmosphere to filter out, while in the ocean, long-time signals were strong and steady, resulting in a high signal-to-noise ratio. Only 1 out of 12 CMIP5 models, namely, the FIO-ESM model, had a larger explanation rate than the reference data, while three, the FGOALS-g3, GISS-E2-1-H and BCC-CSM models, had small explanation rates of less than 40%. The EOF1 time series of the ACCESS1.3, CNRM-CM5, BCC-CSM1.1 and FIO-ESM models were significantly positively related to the reference data result, with the ACCESS1.3 model reaching the highest, over 0.98. However, all CMIP6 models had a lower explanation rate than the reference data, and the HadGEM3-GC31-LL model had the lowest of around 33%. This is consistent with the Taylor analysis result mentioned in Section 3.1, suggesting a weaker SST signal in the CMIP6 model simulations. One more thing to notice is that the three models with a chemistry module, i.e., the CESM2, GFDL-CM4 and CanESM5 models, had significantly positive correlations with the EOF1 time series from the reference data, supporting the above finding that the chemistry module helps to improve SST simulations.
Atmosphere 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/atmosphere and explained over half of the total variance. It exceeded the SLP result, which indicates that there were too many short-time signals in the atmosphere to filter out, while in the ocean, long-time signals were strong and steady, resulting in a high signal-to-noise ratio. Only 1 out of 12 CMIP5 models, namely, the FIO-ESM model, had a larger explanation rate than the reference data, while three, the FGOALS-g3, GISS-E2-1-H and BCC-CSM models, had small explanation rates of less than 40%. The EOF1 time series of the ACCESS1.3, CNRM-CM5, BCC-CSM1.1 and FIO-ESM models were significantly positively related to the reference data result, with the ACCESS1.3 model reaching the highest, over 0.98. However, all CMIP6 models had a lower explanation rate than the reference data, and the HadGEM3-GC31-LL model had the lowest of around 33%. This is consistent with the Taylor analysis result mentioned in Section 3.1, suggesting a weaker SST signal in the CMIP6 model simulations. One more thing to notice is that the three models with a chemistry module, i.e., the CESM2, GFDL-CM4 and CanESM5 models, had significantly positive correlations with the EOF1 time series from the reference data, supporting the above finding that the chemistry module helps to improve SST simulations. Figure 7. Time series and spatial pattern for the first EOF mode of SST from the ERA5 dataset (a,b) and selected models (c-g), with the explained percentage of variance given on the top right (5-95% percentile shaded). Models (c-g) are the same as Figure 6 (b-f). and selected models (c-g), with the explained percentage of variance given on the top right (5-95% percentile shaded). Models (c-g) are the same as Figure 6b-f. Figure 8 shows the spectrum analysis result of the SST EOF1 time series. The peak of the reference data exactly corresponded to the 50-month period of the ACW and passed over the 95% red noise confidence level. The ACCESS1.3 model also generated a similar spectrum. Among the other CMIP5 models, FGOALS-g2 s main peak failed the significance test, and the significant period of the GISS-E2-H and FIO-ESM models was below 40 months while that of the IPSL-CM5A-LR model exceeded 70 months. However, all CMIP6 models except FIO-ESM-2-0 reproduced the 50-month significant period of the ACW. In general, CMIP6 models could simulate the ACW period more accurately than CMIP5 models, especially those with a chemistry module.
Atmosphere 2020, 11, x; doi: FOR PEER REVIEW www.mdpi.com/journal/atmosphere Figure 8 shows the spectrum analysis result of the SST EOF1 time series. The peak of the reference data exactly corresponded to the 50-month period of the ACW and passed over the 95% red noise confidence level. The ACCESS1.3 model also generated a similar spectrum. Among the other CMIP5 models, FGOALS-g2′s main peak failed the significance test, and the significant period of the GISS-E2-H and FIO-ESM models was below 40 months while that of the IPSL-CM5A-LR model exceeded 70 months. However, all CMIP6 models except FIO-ESM-2-0 reproduced the 50-month significant period of the ACW. In general, CMIP6 models could simulate the ACW period more accurately than CMIP5 models, especially those with a chemistry module. Figure 8. Spectra for bandpass-filtered first EOF time series of SST from the ERA5 dataset (a) and selected models (red line is the red noise, and dashed line is the 5% and 95% confidence bounds). Models (b-f) are the same as Figure 6 (b-f).
In summary, five models could simulate the SST signal of the ACW well: the ACCESS1.3 and CNRM-CM5 models from CMIP5, and the CESM2, GFDL-CM4 and CanESM5 models from CMIP6.

Conclusions
The ACW is a crucial air-sea coupling system in the Southern Hemisphere, and it further influences the global climate. Using White and Peterson's method of extracting the ACW from SLP Figure 8. Spectra for bandpass-filtered first EOF time series of SST from the ERA5 dataset (a) and selected models (red line is the red noise, and dashed line is the 5% and 95% confidence bounds). Models (b-f) are the same as Figure 6b-f.
In summary, five models could simulate the SST signal of the ACW well: the ACCESS1.3 and CNRM-CM5 models from CMIP5, and the CESM2, GFDL-CM4 and CanESM5 models from CMIP6.

Conclusions
The ACW is a crucial air-sea coupling system in the Southern Hemisphere, and it further influences the global climate. Using White and Peterson's method of extracting the ACW from SLP and SST fields, we compared the simulated ACWs from historical experiments of the CMIP5 and CMIP6 models with those from the ERA5 data during the most active period of the 1980s and 1990s. The main findings were as follows.
(1) For SLP simulation, models had low correlations with the reanalysis data. The CMIP6 models show a slightly better performance than the CMIP5 models. Three models, namely, the FIO-ESM model from CMIP5 and the CanESM5 and GFDL-ESM4 models from CMIP6, performed well. Models with an atmospheric chemistry module had better simulations.
(2) For SST simulation, the CMIP6 models reproduced weaker but more reliable signals than the CMIP5 models. Five models, the ACCESS1.3 and CNRM-CM5 models from CMIP5 and the CESM2, GFDL-CM4 and CanESM5 models from CMIP6, performed well. Models with a carbon cycle process and a chemistry module tended to produce better simulations. (3) When both SLP and SST were taken into consideration, most CMIP6 models show an improvement compared with the CMIP5 version. Models simulated SST better than SLP. The best simulation was produced by the CanESM5 model. Most models could capture the 50-month period of SLP and SST signals in the ACW. The main problem remained in generating the correct phase.
It should to be clear that here we evaluated the ACW SLP and SST signals separately without considering their correlation; however, they had stable phase differences. Further research may investigate other variables and introduce the correlations between them. Moreover, the physical mechanism of the model spread remains as further research. Finally, although the EOF method has advantages in presenting the ACW in the form of a standing wave, it cannot reveal the propagation trend [16,42]. More sufficient statistical methods and new appropriate variables representing the ACW should be adopted in future studies to deepen our understanding.