Atlantic Niño/Niña Prediction Skills in NMME Models

: The Atlantic Niño/Niña, one of the dominant interannual variability in the equatorial Atlantic, exerts prominent inﬂuence on the Earth’s climate, but its prediction skill shown previously was unsatisfactory and limited to two to three months. By diagnosing the recently released North American Multimodel Ensemble (NMME) models, we ﬁnd that the Atlantic Niño/Niña prediction skills are improved, with the multi-model ensemble (MME) reaching ﬁve months. The prediction skills are season-dependent. Speciﬁcally, they show a marked dip in boreal spring, suggesting that the Atlantic Niño/Niña prediction suffers a “spring predictability barrier” like ENSO. The prediction skill is higher for Atlantic Niña than for Atlantic Niño, and better in the developing phase than in the decaying phase. The amplitude bias of the Atlantic Niño/Niña is primarily attributed to the amplitude bias in the annual cycle of the equatorial sea surface temperature (SST). The anomaly correlation coefﬁcient scores of the Atlantic Niño/Niña, to a large extent, depend on the prediction skill of the Niño3.4 index in the preceding boreal winter, implying that the precedent ENSO may greatly affect the development of Atlantic Niño/Niña in the following boreal summer.


Introduction
The sea surface temperature (SST) in the tropics is one of the major sources of predictability, since tropical SST plays a center role in affecting the global weather and climate through its teleconnection effect [1][2][3][4][5]. A prominent interannual variation of SST in the tropical region is the El Niño-Southern Oscillation (ENSO), which is a well-known phenomenon due to its far-reaching global effects. Likewise, the SST variability in the equatorial Atlantic impacts the weather and climate on the surrounding continents [6][7][8][9]. In previous studies, diverse names are defined for the interannual variation of SST variability in the eastern equatorial Atlantic, e.g., the Atlantic coupled mode [10], Atlantic zonal mode [11], Atlantic El Niño mode [12], Atlantic Niño [13,14], and so on. To facilitate the discussion, here we use the terminology of Atlantic Niño and Atlantic Niña (hereafter Atlantic Niño/Niña) to denote the anomalous warming and cooling SST event occurred in the eastern equatorial Atlantic, respectively. As pointed out by Zebiak [10], the features of Atlantic Niño and Niña resemble those of ENSO in the equatorial Pacific, and the ocean-atmosphere coupled processes involved in Atlantic variability dynamics is, to the first order, similar to those in ENSO [4,[15][16][17]. Nonetheless, in contrast to ENSO, Atlantic variability has some distinctive characteristic, e.g., it has relatively small zonal scale and relatively short life cycle, and it usually peaks in boreal early summer (June-July) [13,17,18]. models than previous coupled models. However, few studies have paid attention to the prediction skill of SST anomalies in the equatorial Atlantic region associated with Atlantic Niño/Niña for the NMME models. Therefore, this promotes us to investigate the prediction skill of Atlantic Niño/Niña in the retrospective forecasts performed by the latest NMME models. Note that the "prediction" and "forecast" are viewed synonymously throughout the remaining paper.
In terms of the forecast errors, previous studies suggested that the mean state simulation biases may greatly affect the seasonal-interannual SST anomalies prediction in coupled models (e.g., [2,48]), in addition to the errors in model initialization. A number of studies have pointed out that the current coupled models show significant mean state biases in Atlantic sector. For instance, warm bias of mean SST in the eastern equatorial Atlantic is prevalent in coupled models [49][50][51], which is suggested to be related to the mean state biases of trade winds, alongshore winds and coastal upwelling [52], and/or attributed to the unrealistic representation in cloud cover and ocean mixed layer in coupled models [53]. Such mean state biases, especially the warm SST bias may affect the Atlantic variability prediction, as suggested by Richter et al. [48]. Another argument is that the ability of predicting ENSO may affect the predictive skill of SST anomalies associated with Atlantic Niño and Atlantic Niña, due to the strong connection between ENSO and the SST anomalies in Atlantic sector [36,52]. A recent study [54] found that most of the Atlantic Niño/Niña events are preconditioned in boreal spring by either the Atlantic meridional mode or Pacific SST conditions like ENSO, while for some of the events that has a relatively late onset timing, there is no clear source of external forcing. This indicates that whether ENSO would influence the Atlantic Niño/Niña in the seasonal prediction systems needs to be examined. Therefore, this study will investigate whether there is any common factors responsible for the skill of predicting SST anomalies associated with Atlantic Niño/Niña, based on the NMME models.
The paper is organized as follows. Section 2 introduces the NMME models and the datasets used in this study. Section 3 presents the overview prediction skill of the Atlantic Niño/Niña, the corresponding seasonality of prediction skill, and the comparison for the prediction skill between the developing phase and decaying phase, and the warm and cold events. Section 4 presents the prediction biases in mean states, and investigates the possible factors responsible for the forecast errors based on the multiple model results. Finally, a summary is giving in Section 5.

Models and Data
The North American Multi-model Ensemble project (NMME) is a multi-model forecasting system composed of coupled models from the modeling centers of the United States and Canada [37]. The analysis in this study is conducted based on the forecast results from 13 coupled models participated in NMME Phase 1 and Phase 2, including CanCM4i, Can-SIPv2, CMC1-CanCM3, CMC2-CanCM4, GEM-NEMO, NASA-GEOSS2S, NCAR-CESM1, NCEP-CFSv2 in Phase 1, and CanCM3, CanCM4, CCSM4, CESM1, FLORB-01 in Phase 2. Table 1 lists a brief summary of model specification, including the forecast length, lead time and ensemble size. The retrospective forecasts encompass at least 10 members, and the lead time ranges from 10 to 12 months. In this study, we will focus on the common period of 1980-2010 and analyze the prediction skills for each of the NMME models and the MME. In general, the MME prediction skill is better than most of the individual model's skill and the averaged skill of all models [21,38]. When we detect the forecast skill without pointing out a specific model, it is regarded as the ability of the MME made by 13 NMME models. For verification, the monthly SST dataset from the HadISST [55] is utilized. Next, some ambiguous expressions and some detailed information are clarified below. Firstly, the terminology of the "predict" and "forecast" are viewed synonymously in this study. Secondly, "one-month-lead forecast" means the forecast conducted from the first day of the starting month to the current month itself. In this sense, "two-month-lead forecast" denotes the forecast from the starting month to the second month. For example, in terms of the forecasts initiated in 1st January 2000, the first monthly mean (i.e., the average of 1-31 January 2000) of the forecast is defined as one-month-lead, the second monthly mean (i.e., the average of 1-28 February 2000) is defined as two-month-lead, and the remaining lead times are defined analogously. Then, the "anomalies" or "anomaly" is the departure from the long-term annual cycle rather than the departure from annual mean. Specifically, for a certain model, we firstly obtain the annual cycle of SST for different lead times (note that the annual cycle of SST in a same model may vary with the increase of lead time). For a given lead time, the annual cycle of SST at this lead time is subtracted from the raw outputs, and then the SST anomalies are obtained. In this way, both the anomaly field and the mean state field (or the annual cycle field) are obtained. Then, the "bias" is denoted by the deviation of a predictand from the counterpart in the observation (i.e., model forecast minus observation), which includes the anomaly bias part and the mean state bias part.
This study applied an equally-weighted multiple model average strategy when obtaining the multi-model ensemble result, as eleven out of thirteen models have ten members and only two models have more members. Additionally, the linear trend has not been removed. It is worth mentioning that a previous study [56] has investigated that the impact of global warming trend on seasonal-to-interannual climate prediction. They found that the SST warming trend's impact is negligible for short lead predictions, and the linear trend should be considered for only the seasonal prediction at long times (more than one year). They further pointed out that the impact of the SST warming trend on the seasonal prediction is considerable for extratropical oceans but marginal for the tropical oceans. Therefore, for the case in this study, the SST linear trend may not influence the prediction skill of Atlantic Niño/Niña. For the deterministic measures, previous studies [21,[57][58][59] usually chose 0.6 or 0.5 as the cut off value for anomaly correlation coefficient, and chose one standard deviation of the oscillation itself or some values slightly less than one standard deviation as the threshold value for the root-mean-squared errors.

Results
In this section, the prediction skills of SST anomalies in the eastern equatorial Atlantic performed by 13 coupled models in NMME Phase 1 and Phase 2 are investigated in detail. Figure 1a shows the standard deviation (STD) of SST anomalies in the equatorial Atlantic Ocean. In general, the STD of SST anomalies are large in the eastern equatorial Atlantic with values reaching 0.5 K (Figure 1a). Previous studies (e.g., [10]) introduced a SST index called Atl3 index for the SST anomalies (SSTA) averaged in the region 3 • S-3 • N, 20 • W-0 • W to define an event of Atlantic Niño or Niña. Analogous to what the Niño3.4 index (SSTA averaged over 5 • S-5 • N, 170 • W-120 • W) means to ENSO, the Atl3 index is applied to measure the variability of Atlantic Niño/Niña and to gauge the prediction skill of Atlantic Niño/Niña. Previous studies found that Atlantic Niño/Niña usually develops in boreal spring, matures in boreal summer and then decays in the subsequent months. As indicated in Figure 1b, the evolution of the STD of Atl3 index along with the calendar month shows that Atlantic Niño/Niña prefers to peak in boreal summer. Figure 1c shows the time evolution of Atl3 index derived from the observation and the prediction from the MME made by the NMME models. In general, the fluctuation of the Atl3 index derived from the two-and five-month-lead MME forecasts resembles that in the observation to some extent. An obvious bias is that the amplitude of Atlantic Niño/Niña is underestimated in the MME forecasts ( Figure 1c). The amplitude of the predicted Atlantic Niño/Niña in the MME forecasts at two-to five-month-lead are shown in Table 2. The STD of Atl3 index in the observation is up to 0.465, and the STD of the predicted Atl3 index in the MME forecast is 0.296 in the two-month-lead forecast. Such underestimation deteriorates with the increase of lead time, and reaches only 0.192 in the five-month-lead forecast. The amplitude bias in predicting the Atlantic Niño/Niña provokes our further exploration in Section 4.  The standard deviation of the observed Atl3 index and the predicted Atl3 index in MME. The time series of the predicted Atl3 index is displayed by colored lines in Figure 1c.

STD of Atl3 Index
Observation  0.465 Two-month-lead MME  0.296 Three-month-lead MME  0.245 Four-month-lead MME  0.213 Five-month-lead MME  0.192 To quantitatively assess the predictive skill of Atlantic Niño/Niña, the analysis based on the anomalies correlation coefficient (ACC) and the root-mean-squared errors (RMSE) is carried out. The ACC between the Atl3 index in the forecast and the counterpart in the observation are displayed in Figure 2a. In most of the individual models, the ACC scores are above the persistence, although few model forecasts show poor ACC scores. When 0.6 is chose as the cut off value for ACC, the prediction skills in half of the NMME models can reach three months. Among the NMME models, CanCM4i and CanSIPSv2 show the best skill in predicting Atlantic Niño/Niña. The MME result shows better prediction skill than any individual model. Specifically, the prediction skill for the MME reaches 6 (more than 4) months when 0.5 (0.6) is chose as the passing score for ACC. The RMSE results from each model and the MME result are displayed in Figure 2b. From the perspective of RMSE, most of the models are capable to predict Atl3 index at 3month-lead with the RMSE less than 0.4. In particular, four models (CanCM4i, CanSIPSv2, CMC1-CanCM3, and CMC2-CanCM4) show the RMSE results below 0.4 at 4-month-lead. When one STD of the observed Atl3 index (say, 0.465) is chose as the threshold value (see grey dashed line in Figure 2b), most of the models have the ability to predict Atl3 index at 7-month-lead or even 12-month-lead. Likewise, the MME result shows better performance than any individual model, in terms of the RMES. Specifically, the prediction skill for the MME result can reach nearly five months when 0.4 is chose as the threshold value. Both ACC and RMES results demonstrate that the MME approach is a more effective way to obtain better skill in predicting Atlantic Niño/Niña.
Previous studies have pointed out that prediction skill of Atlantic Niño/Niña reaches around two months or constrainedly up to three months for some particular initial months in either a statistical model [31,34] or a dynamical model [35]. From the perspective of an individual model, more than half of the NMME models exhibit slightly better prediction skills of Atlantic Niño/Niña than previous models. Among the NMME models, both CanCM4i and CanSIPSv2 show the best prediction skill. From the perspective of the MME result, Stockdale et al. [35] shows that the MME forecast skill based on DEMETER models reaches only 4 months with the RMSE less than 0.4 (see Figure 10b in Stockdale et al. [35]). Our current result shows that the prediction skill for the MME made by NMME models reaches five months with the RMSE less than 0.4. The results above show that it is encouraging to find that the current NMME models exhibit slight improvements in the prediction skill of Atlantic Niño/Niña compared to previous models.

Seasonal Dependence of the Atlantic Niño/Niña Prediction Skill
We further investigate the seasonal dependence of the Atlantic Niño/Niña prediction skill in this section. Figure 3 provides the ACC for the MME as a function of lead time (horizontal axis) and the forecast start month (vertical axis). The ACC results show clear seasonality. The forecasts starting in boreal summer (including May, June, July and August) have higher skill than those starting in boreal winter (including December, January, and February). The former that starts in boreal summer shows the corresponding ACC higher than 0.6 (0.5) can hold through 6 (8-10) months lead, whereas the latter that starts in boreal winter shows the corresponding ACC higher than 0.6 (0.5) can hold through 3-4 (2-4) month lead. It is noted that the MME ACC result for the forecasts starting in December, January, February and March rebound at the lead time from 7-to 10-month, following the quick decline of the prediction skill in the first 3 months. It is argued that this is spurious and is primarily attributed to the error cancellation in MME mean, because the forecasts starting in boreal winter sharply drops when the forecast lead time exceeds 3 months, and the forecasts at the following 7-and 10-month-lead are almost totally wrong in the sight of any individual model (figure not shown). It is worth mentioning that such skill rebound feature is also noted by some other relevant seasonal prediction studies [21,27,59,60]. Through decomposing the RMSE results by the forecast start month, similar seasonality feature is found. Specifically, the skill for the forecasts starting in boreal summer is higher than those starting in boreal winter. Therefore, our results show that the prediction skill of Atlantic Niño/Niña has seasonal dependency.
Based on the ACC and RMSE results, the prediction skill of Atlantic Niño/Niña generally reaches more than 6 months for the forecasts starting from May to November, but is limited within 4 months for the forecasts starting in boreal winter. The results above show that there is a marked dip across the boreal spring in terms of the prediction skill, indicating that the prediction of Atlantic Niño/Niña in NMME models suffers a "spring predictability barrier". It is worth mentioning that previous studies recognized that the ENSO prediction skill also declines promptly in boreal spring, and such "spring predictability barrier" for ENSO prediction widely occurs in the dynamical model forecasts [2,20,61,62]. The spring predictability barrier in the Atlantic Niño/Niña prediction may be partly due to the lack of stochastic noise forcing in the climate models used for forecasts, as it is argued that this is the case for ENSO prediction, that is, a previous study has pointed out that the spring predictability barrier on ENSO is largely attributed to the lack of stochastic noise in the climate models [63]. The specific reasons for the spring predictability barrier in the Atlantic Niño/Niña prediction is complicated and merits thorough investigation in the future.

Comparisons for the Atlantic Niño and Niña Prediction Skills
To specify the prediction skill of the Atlantic variability in detail, we further evaluate the forecast skills for Atlantic Niño and Atlantic Niña, respectively. We firstly selected seven Atlantic Niño events (1987,1988,1991,1995,1996,1999,2008) and six Atlantic Niña events (1982,1983,1992,1994,1997,2005) in the observation. The criteria for a certain Atlantic Niño (Niña) event is that the observed SST anomalies exceed one (minus one) standard deviation for at least two consecutive months. In general, most of the Atlantic Niño/Niña events are initiated from early boreal spring, peak in June-July, and then decay in the following months. The ACC for the selected Atlantic Niño events and Atlantic Niña events in the MME forecasts starting in the developing phase (i.e., February, March) and the decaying phase (i.e., August, September) is shown in Figure 4. Obviously, the forecasts starting in the decaying phase of Atlantic Niño (see red lines in Figure 4c,d) have higher skills than those starting in the developing phase (see red lines in Figure 4a,b). Such results hold for the Atlantic Niña (blue lines in Figure 4c,d versus blue lines in Figure 4a,b). The fact that the forecast skills are minimal in the developing phase is consistent with that the "spring predictability barriers" of Atlantic Niño/Niña prediction. The forecast skills for Atlantic Niño are further compared with that for Atlantic Niña. It is noted that the forecast skills for Atlantic Niña are generally higher than those for the Atlantic Niño (red line in Figure 4 versus blue line in Figure 4), no matter the developing phase or decaying phase is focused on.
Regarding the reasons for the contrasting prediction skills between the Atlantic Niño and the Atlantic Niña, a preliminary analysis is conducted. Following some recent studies (e.g., Larson and Kirtman [64]; Hu et al. [65]), we calculated the signal-to-noise ratio (SNR) of zonal wind anomaly for the models. Due to data availability, only eight models that provided the zonal wind outcome are used here. As shown in Figure 5, all the models show that the SNR for the Atlantic Niña prediction is obviously larger than the SNR for the Atlantic Niño prediction, indicating that the Atlantic Niña is more predictable than Atlantic Niño. This may partly explain why the prediction skill of Atlantic Niña is higher than that of Atlantic Niño in NMME models. It is worth mentioning that such result (i.e., SNR for the Atlantic Niña is larger than that for Atlantic Niño) still holds, when using the forecast results at different lead times or slightly altering the specific equatorial Atlantic region for calculation (figures not shown).

Overall Probability Forecast Skill
In addition to the deterministic measures, we further assessed the prediction skill in a probabilistic sense. Specifically, the probabilistic measures, including the Brier skill score (BSS), ranked probability score skill (RPSS) and relative operating characteristics (ROC), are used to verify the probabilistic forecasts. In general, the results based on the probabilistic measures are consistent with those based on the deterministic measures. Here two representative models (CMC1-CanCM3 and CanCM4i) are selected for presenting detailed results in the probabilistic sense.

BSS
The Brier score (BS) is a widely-used verification measure to assess the accuracy of probability forecasts. It is the mean squared error between the forecast probability and the observed frequency. To clarify the definition of BS, we firstly define the forecast probability for the ith forecast (P i ) as: Then the BS can be obtained as follows: To compare the BS to that for a reference forecast system BS ref , the Brier score skill (BSS) is defined by the BS and the reference forecast (BS ref ) as below: where the BS ref denotes the reference forecast, and can be obtained by Here the observed frequency O is taken as the climatological probability. In this way, BSS indicates the skill of probability forecast relative to the climatological forecast. When the BSS is greater than 0, it means the probability forecast is skillful; and BSS equaling to 1 means the probability forecast is ideally perfect.
With the aid of verification method of BSS, we investigated the probabilistic prediction skill of Atlantic Niño/Niña in two representative models (CMC1-CanCM3 and CanCM4i) over the verification period (29 years). Here each model has 10 members, and ensemble members are weighted equally for a given model. Figure 6 shows the BSS results for the Atlantic Niño prediction (red) and Atlantic Niña prediction (blue) in CanCM4i model (solid line) and CMC1-CanCM3 (dashed line). In general, the BSS results reveal that the prediction skill can reach four or even five months for the prediction of Atlantic Niño and Atlantic Niña in these two models. Moreover, it is found that the probabilistic forecast skill of Atlantic Niña is superior to that of Atlantic Niño in these two models. For instance, the BSS of Atlantic Niña is greater than 0.6 at 2-and 3-month-lead forecasts and can keep above 0.2 at 5-month-lead forecast; while the BSS of Atlantic Niño reaches close to 0 at 4-month-lead forecast. Here the finding that the prediction skill of Atlantic Niña is higher than that of Atlantic Niño agrees with previous finding based on deterministic measures.

RPSS
The ranked probability score skill (RPSS) is another widely-used method to verify the probability forecasts. The RPSS is based on ranked probability score (RPS) for a set of forecasts. For the ith forecast, the RPS is defined as the sum of squared differences of the cumulative probabilities in the forecasts and the corresponding outcomes in the observation: In this study, the Atlantic Niño/Niña events are grouped into three categories (l = 1, 2, 3), including Atlantic Niño event year, Atlantic Niña event year, and neutral year. For the ith forecast, P k,i is the forecast probability of the kth category (k = 1, . . . , l), and can be obtained based on Equation (1). Similarly, O k,i is the observed frequency of the kth category, and can be obtained based on Equation (3). Therefore, ∑ l k=1 P k,i is the sum of probabilities from the first category to the lth category. ∑ l k=1 O k,i is the sum but for the corresponding outcomes in the observation. For the ith forecast, the RPS of a single forecast is the sum of squared differences of three categories.
Then the RPSS is defined as follows: where RPS and RPS re f are the RPS of the forecasts being evaluated and a reference forecast, respectively, averaged over multiple forecasts. For the RPS ref , constant cumulative categorical probabilities Q i is used, where Q = [0.31, 0.38, 0.31], following Tippette et al. [66]. Same as BSS, a good probabilistic forecast should have a small RPS and a large RPSS score. We next used the method of RPSS to measure the predicted errors between the probability forecast and the observed frequency for the accumulated forecast skill of Atl3 index. It is worth mentioning that three categories are applied here, and the results are not sensitive to the number of categories, as previous study [66] stressed that the forecast skill does not depend strongly on the number of forecast categories too much. As shown in Figure 7, the RPSS is greater than 0.4 for both models. The value of RPSS reaches 0.15 for the 4month-lead forecast in CanCM4i, and 0.1 for the 5-month-lead forecast in CMC1-CanCM3. This indicates that from the perspective of RPSS, the prediction skill can reach 4 months for CanCM4i and 5 months for CMC1-CanCM3, which is generally consistent with the BSS results.

ROC
Another commonly-used method of verifying probability forecasts is relative operating characteristics (ROC), which is proposed by Kirtman [67]. Relying on the ratio of hit rate (HR) to false alarm rate (FAR), the ROC curves denote the ratios of different member-based probability forecasts: where O 1 indicates the number of correct forecasts of the events, while O 2 is the number of misses. NO 1 denotes the number of false alarms and NO 2 is the number of the correct rejections. HR means the fraction of events that were forewarned the ratio of correct forecasts, and FAR denotes the fraction of nonevents that occurred after a warning was issued. An HR of one means that all occurrences of the event were correctly predicted and an HR of zero indicates that none of the events were correctly predicted. The FAR also ranges from zero to one with a value of zero indicating that no false alarms were issued. Therefore, an ideally perfect probabilistic forecast system would have an HR of 1 and an FAR of 0. The ROC curve consists of ten points (for the model used here, there are ten members) distributed from lower-left corner to the upper-right corner. The first point on the ROC curve denotes ten out of ten ensemble members that have forecasted a particular event, indicating how skillful the model is when all members consistently forecast the given event to occur. The second point indicates that nine out of ten ensemble members forecasted the event, and the remaining point along the curve vary analogously. In this sense, if all the points on the ROC curve lie close to the diagonal line, it means the probabilistic forecast system has no skill; and if the points on ROC cluster close to the upper-left corner, it means the model has high probabilistic forecast skill.
At last, we accessed the probabilistic forecast skill using the relative operating characteristics (ROC) analysis. To eliminate the uncertainty of exceptional member forecast in the ROC curves, we repeat the ROC curves by Monte Carlo technique, following Kirtman [67]. Specifically, for a given model that has ten members, we randomly select nine out of ten samples to calculate the ROC results. Then the average of the ROC results derived from different combinations is given. Figure 8 shows the ROC curves for the Atl3 hindcasts at lead time of three months, four months, and five months. In general, the ROC curves show that both models have considerable skills in predicting the warm events (red curve), cold events (blue curve) and the neutral status (green curve) at three-, four-, and five-month-lead forecasts. For the three-month-lead forecasts, all the three ROC curves are far away from the diagonal in the two models, indicating that both models have large hit rates but small false alarm rates at three-month-lead. Along with the increase of lead time, the ROC curves gradually get close to the diagonal, indicating the drop in the probabilistic forecast skill. For instance, the ROC curves for Atlantic Niño prediction and neutral status prediction in the five-month-lead forecasts lie close to the diagonal, indicating relatively low probabilistic forecast skill at five-month-lead. It is further found that the probabilistic forecast skill of Atlantic Niña prediction is (see blue curves) higher than those of Atlantic Niño prediction (red curve) and neutral status prediction (green curve), and such contrast holds for all the three-, four-, and five-month-lead forecasts analyzed here. Here the probabilistic forecasts skills revealed by ROC curves are generally consistent with the findings based on BSS and RPSS.
Overall, the analysis above based on the three methods (BSS, RPSS, and ROC) show consistent results, that is, the prediction skill of the two models reaches about four or even five months, and the prediction skill for Atlantic Niña is higher than Atlantic Niño. Such results in the probabilistic sense agree well with previous findings based on the deterministic metrics. It is worth mentioning that many previous studies (e.g., Wang et al. [21]; Liu et al. [68]) have pointed out that the probabilistic forecast skill and the deterministic forecast skill have monotonic nonlinear relationship, and the results from these two aspects can be considered as the supplement for each other.

Possible Factors Responsible for the Atlantic Niño/Niña Forecast Errors
The analysis above has revealed the amplitude bias of the predicted Atlantic Niño/Niña, and the prediction skills based on the ACC results in NMME models. In this section, we will investigate the possible factors contributing to the Atlantic Niño/Niña forecast errors. Previous studies have pointed out that the background mean state is essential for the Atlantic Niño/Niña development in the observation [13,15,69], and the Atlantic Niño/Niña simulation and prediction [49,70,71]. Therefore, we will examine whether the mean state biases make a contribution to the Atlantic Niño/Niña forecast errors. Besides, as the remote influence of ENSO on the Atlantic SST variability is widely admitted [9,14,72], we thus will examine whether prediction skill of ENSO or the SST anomalies in the central-eastern equatorial Pacific affects the forecast skill of Atlantic Niño/Niña. It is accepted that the factors determining the forecast skill may be model-dependent. Our strategy here is to identify whether there is any common factors influencing the forecast skills across the NMME models.

Mean State Biases in Equatorial Atlantic Sector
We firstly present the mean state bias and annual cycle bias in the equatorial Atlantic sector. Figure 9 shows mean SST bias for the MME result ( Figure 9a) and each model (Figure 9b-n). In general, almost all the NMME models (except for NASA-GEOSS2S and CCSM4), show some common feature regarding the spatial pattern of mean SST bias. In particular, there is obvious warm bias of mean SST locating in the central-eastern equatorial Atlantic, which extends along the western Africa coast. In the MME result, the mean SST bias reaches around 1 • C east of 20 • W. In contrast, a cold bias of mean SST emerges along the western edge of equatorial Atlantic basin, leading to a weakening in the zonal gradient of mean SST in the equatorial Atlantic. Accordingly, MME shows a weakening in the easterly trade wind and a reduced mean thermocline slope in the equatorial Atlantic (figure not shown), which is consistent with the mean SST bias. Figure 9. The biases of the predicted mean SST for (a) MME and (b-n) the individual models. The mean SST biases shown here are derived from the four-month-lead forecasts, which resemble those at different lead times. In general, the mean state biases have the similar spatial pattern for different lead time forecasts except that the magnitude of the biases increases with the increasing lead time.
The systematic bias of the predicted annual cycle in the eastern equatorial Atlantic is also examined. Figure 10 shows the annual cycle of SST in which the annual mean has been subtracted. In the observation (Figure 10a), an annual period of mean SST dominated in the eastern equatorial Atlantic, with a visible extension towards the west; and the warmest (coldest) mean SST in the eastern equatorial Atlantic occurs in March-April (August) with a magnitude more than 2 • C. For the forecast results, the majority of the NMME models have underestimated the intensity of the annual cycle of SST, except that few models (e.g., NCEP-CFSv2) show an overestimated annual cycle of SST. Additionally, the phase of the mean SST annual cycle tends to shift slightly earlier than that in the observation. Previous studies (e.g., [2]) have suggested that the bias of the amplitude of the annual cycle of SST could influence the ENSO prediction skill, thus we will next pay more attention to relationship between the amplitude bias of SST annual cycle and the Atlantic Niño/Niña prediction skill. Here the mean SST subtracting the annual mean SST denotes the annual cycle of SST, and the SST annual cycle for the MME and the individual models are derived from the four-month-lead forecast results, which resemble those in the other lead time forecasts.

Factors Responsible for the Amplitude Bias of Atlantic Niño/Niña Prediction
Previous sections have shown that the forecast results have mean state biases, including the warm bias of mean SST in the eastern equatorial Atlantic, the underestimation in the zonal gradient of mean SST in the equatorial Atlantic, and the underestimated intensity of the annual cycle. We will examine whether there is any link between the mean state biases and the prediction biases among the NMME models.
As shown above, the underestimation in the amplitude of the predicted Atlantic Niño/Niña is prevail in the forecasts by the NMME models. Here the difference between the STD of the predicted Atl3 index and the STD of the observed Atl3 index (i.e., model forecast minus observation) denotes the amplitude bias in predicting Atlantic Niño/Niña. We firstly investigate the relationship between the mean SST biases and the bias in predicting the Atlantic Niño/Niña amplitude. Figure 11a shows the scatter diagram between the amplitude bias in predicting Atlantic Niño/Niña and the bias of mean SST averaged over Atl3 region. The corresponding correlation coefficient is relatively low and does not exceed the 95% confidence level (here the critical value for 95% confidence level is 0.55). Similarly, low and insignificant correlation coefficient between the amplitude bias and the RMSE of mean SST biases over the Atl3 region (Figure 11b). These indicate that the mean SST bias in the eastern equatorial Atlantic may be not responsible for the amplitude bias. Furthermore, the amplitude bias in predicting Atlantic Niño/Niña is also not significantly correlated with the bias of the zonal gradient of mean SST in the equatorial Atlantic at 95% confidence level, as shown in Figure 11c. Analogous to the amplitude bias in predicting Atl3 index, we calculated the amplitude bias of Niño3.4 index in the nine-month-lead forecasts for each model. The relationship between the amplitude bias of the predicted Atl3 index and the amplitude bias of the predicted Niño3.4 index is shown in Figure 11d. The low and insignificant correlation coefficient between them indicates that the amplitude bias in predicting ENSO is not responsible for the amplitude bias in predicting Atlantic Niño/Niña. Note that the results above (i.e., the amplitude bias of predicting Atlantic Niño/Niña) is derived from the three-month-lead forecast. We have further calculated the results for the other lead times (e.g., the results derived from the five-month-lead forecast the shown in Figure 12), and found that the main conclusion can hold. Nonetheless, the aforementioned insignificant statistic results cannot indicate that the mean state bias has no impact on the prediction errors, and this may arise from the possibility that the impact of mean state bias is model dependent.  Our results suggest that the amplitude bias in predicting Atlantic Niño/Niña is associated with the amplitude bias of the annual cycle of SST. As shown in Figure 13, the fidelity of predicting the amplitude of Atlantic Niño/Niña is well correlated with the fidelity of predicting the amplitude of SST annual cycle. The correlation coefficients between them for the four-, five-, and six-month-lead forecasts are, respectively, 0.59, 0.58, and 0.55, which exceed 95% confidence level. Additionally, the correlation coefficients among the NMME models could increase much more when an outlier (i.e., the forecast results in NCEP-CFSv2) is excluded. Overall, our results indicate that the amplitude bias in predicting Atlantic Niño/Niña could be attributed to the system bias in predicting the amplitude of SST annual cycle. This may be due to the fact that the background mean state plays a vital role in modulating the amplitude of interannual variability. The detailed reason behind will be thoroughly analyzed in the future.

Factors Responsible for the Prediction Skills Based on ACC
Based on ACC, the MME forecast skill of the Atlantic Niño/Niña, with a longer lead time, is superior to those reported by previous studies. However, the forecast skill is limited in the lead time of around five months, and it cannot overcome the aforementioned "spring predictability barrier". Hence it is worth finding out the factors responsible for the forecast skills based on ACC.
Again, to avoid that the factors that may be model-dependent, we will attempt to identify the common factors according to the multi-model forecast results. Figure 14a-d show the scatter diagrams between the ACC results for the Atlantic Niño/Niña forecasts and the biases of mean SST averaged in Atl3 region (Figure 14a), the RMSE of mean SST biases in Atl3 region (Figure 14b), the biases of the zonal gradient of mean SST in the equatorial Atlantic (Figure 14c), the amplitude biases of SST annual cycle (Figure 14d), respectively. The corresponding correlation coefficients are relatively low and insignificant, indicating that the local mean SST bias, the bias of the zonal gradient of mean SST, and the bias of SST annual cycle are not responsible for the forecast skills based on ACC.  Since recent studies suggested that there is close linkage between the interannual variability in tropical Pacific and the interannual variability in tropical Atlantic (e.g., [9,72]), we further investigate whether the prediction skill of ENSO has an impact on the prediction skill of Atlantic Niño/Niña. The Atlantic Niño and Niña events in the observation and the selection criteria are described in Section 3.3. For a given model, we firstly calculated the forecast skill for all the selected Atlantic Niño and Niña events (e.g., the ACC of the June-July averaged Atl3 index in the April-initiated forecast is 0.73 in CanCM4i), and then calculated the forecast skill for the Niño3.4 index in the preceding boreal winter (e.g., the ACC of the preceding December-February averaged Niño3.4 index in the Mayinitiated forecast and June-initiated forecast is, respectively, 0.91 and 0.94 in CanCM4i). Finally we obtain the relationship between the prediction skill of the Atlantic Niño/Niña events and the prediction skill of the preceding ENSO or the preceding December-February averaged Niño3.4 index among the NMME models. As shown in Figure 15a, the correlation coefficient reaches 0.66, showing a significantly positive correlation between the Atlantic Niño/Niña prediction skill and the preceding ENSO prediction skill. Consistently, the results based on the eight-month-lead forecasts of Niño3.4 index show same relationship between them, as shown in Figure 15b. Our results indicate that how well a model predicts the Atlantic Niño/Niña may greatly depend on how well it predicts the preceding ENSO or the SST anomalies in the central-eastern equatorial Pacific.
It is worth mentioning that Hu et al. [36] building on a single dynamical model forecast results might be among the first to suggest that the prediction skill of ENSO influences the prediction skill of Atlantic variability. Our results based on the NMME models further confirm this viewpoint, that is, the Atlantic Niño/Niña prediction skill, in particular the ACC, is primarily dependent on the preceding ENSO prediction skill. Such interesting result is consistent with some recent studies (e.g., [3,14,73,74]) that suggested ENSO can impact the Atlantic Niño/Niña via the basin interaction between tropical Pacific and Atlantic.

Conclusions
Atlantic Niño/Niña is one of the dominant interannual variability in the climate system and exerts great influence on the weather and climate in the surrounding continental regions. However, previous studies showed that the prediction skill of the SST anomalies associated with Atlantic Niño/Niña reaches around two months or constrainedly up to three months, showing a relatively low prediction skill. As the latest NMME models have released forecast results, this study assessed the prediction skill of Atlantic Niño/Niña for the NMME models based on both deterministic and probabilistic measures, and investigated the possible factors responsible for the forecast errors. The main conclusions are summarized below.
(1) Almost all the NMME models have underestimated the amplitude of Atlantic Niño/Niña, and the amplitude bias generally increases with the increasing lead time. From the perspective of the individual models, the prediction skill of Atlantic Niño/Niña for the majority of the NMME models can reach three months. Specifically, most of the models are capable to predict Atlantic Niño/Niña at three-month-lead with the RMSE less than 0.4. Particularly, four models (CanCM4i, CanSIPSv2, CMC1-CanCM3, and CMC2-CanCM4) show the RMSE results below 0.4 at 4-month-lead. When one STD of the observed Atl3 index is chose as the threshold value, most of the models have the ability to predict Atl3 index at seven-month-lead or even 12-month-lead. When 0.6 is chose as the cut off value for ACC, the prediction skills in half of the NMME models can reach three months. Among the NMME models, CanCM4i and CanSIPSv2 show the best skill in predicting Atlantic Niño/Niña. Two representative models are selected for further assessing the prediction skill in a probabilistic sense. The results based on the probabilistic measures (BSS, RPSS and ROC) agree with each other, and are generally consistent with those based on the deterministic measures. (2) The MME made by the NMME models shows better prediction skills than any of individual models. Specifically, the prediction skill for the MME reaches 6 (more than 4) months when 0.5 (0.6) is chose as the cut off value for ACC. As to the RMSE, the MME result keeps far below one STD of the observed Atl3 index for even 12month-lead forecast, and the prediction skill for the MME result can reach nearly five months when 0.4 is chose as the threshold value. Therefore, the prediction skill for the MME can reach around five months, indicating that the MME method is an effective approach for reducing forecast errors. (3) It is further found that the prediction skill of Atlantic Niño/Niña shows clear seasonality. Both ACC and RMSE results show that the prediction skill of Atlantic Niño/Niña generally reaches more than six months for the forecasts starting from May to November, but is limited within four months for the forecasts starting in boreal winter. As the prediction skill shows a marked dip across the boreal spring, it is suggested that the prediction of Atlantic Niño/Niña in NMME models suffers a "spring predictability barrier". (4) The more detailed assessments document that the prediction skill for Atlantic Niña is higher than that for Atlantic Niño, and the prediction skill in the developing phase is better than that in the decaying phase. A preliminary analysis reveals that all the models show that the SNR for the Atlantic Niña prediction is obviously larger than the SNR for the Atlantic Niño prediction, indicating that the Atlantic Niña is more predictable than Atlantic Niño. The contrasting potential predictability estimated by SNR may partly explain why the prediction skill of Atlantic Niña is higher than that of Atlantic Niño in NMME models. (5) Our further analysis show that the amplitude bias of the predicted Atlantic Niño/Niña is primarily attributed to the amplitude bias of the annual cycle of SST, while the mean state bias (e.g., mean SST bias in Atl3 region) and the amplitude bias of Niño3.4 index are not the common factors among the models. Generally speaking, a weak annual cycle of SST corresponds to an underestimation of the Atlantic Niño/Niña variability, and vice versa. The detailed reason behind this needs further investigation in the future. From the perspective of ACC scores, we found that the prediction skill for the Atlantic Niño/Niña events, to a large extent, relies on the prediction skill for the preceding boreal winter (December-February) averaged Niño3.4 index (or the preceding ENSO).
The factors affecting the seasonal prediction skills are various and may be modeldependent. For instance, the model formulation, such as the initialization methods, the model resolution and physical parameterization schemes, can greatly influence the seasonal prediction skill in a certain model. Thus, our strategy in this study is to identify the common factors influencing the prediction skill of Atlantic Niño/Niña with the aid of multiple models from the latest NMME. Based on the multiple model results, it is suggested that the prediction skill of Atlantic Niño/Niña may benefit from a more realistic annual cycle in eastern equatorial Atlantic sector and an improved ENSO prediction skill in the dynamical models. Data Availability Statement: Publicly available datasets were analyzed in this study. The NMME model datasets can be obtained from https://www.cpc.ncep.noaa.gov/products/NMME/data.html (accessed on 31 May 2021), and the HadISST data can be obtained from https://www.metoffice.gov. uk/hadobs/hadisst/data/download.html (accessed on 31 May 2021).