Multi-Model Ensemble Sub-Seasonal Forecasting of Precipitation over the Maritime Continent in Boreal Summer

: The Maritime Continent (MC) is a critical region with unique geographical conditions and signiﬁcant monsoon activities that plays a vital role in global climate variation. In this study, the weekly prediction of precipitation over the MC during boreal summer (from May to September) was analyzed using the 12-year reforecasts data from ﬁve Sub-seasonal to Seasonal (S2S) models, including the China Meteorological Administration (CMA), the European Centre for Medium-Range Weather Forecasts (ECMWF), Environment and Climate Change Canada (ECCC), the National Centers for Environmental Prediction (NCEP), and the Met O ﬃ ce (UKMO). The result shows that, compared with the individual models, our newly derived median multi-model ensemble (MME) can signiﬁcantly improve the prediction skill of sub-seasonal precipitation in the MC. Both the Temporal Correlation Coe ﬃ cient (TCC) skill and the Pattern Correlation Coe ﬃ cient (PCC) skill reached 0.6 in lead week 1, dropped the following week, did not exceed 0.2 in lead week 3, and then lost their signiﬁcance. The results show higher prediction skill near the Equator than in the north at 10 ◦ N. It is di ﬃ cult to make e ﬀ ective predictions with the models beyond three weeks. The prediction ability of the median MME improves signiﬁcantly as the total number of model members increases. The prediction performance of the median MME depends not only on the diversity of models but also on the number of model members. Moreover, the prediction skill is particularly sensitive to the intensity and phase of Boreal Summer Intraseasonal Oscillation 1 (BSISO1) with the highest skills appearing at initial phases 1 and 5.


Introduction
The Maritime Continent (MC), located at the junction of the Indian Ocean and the Pacific Ocean, consists of several islands and shallow seas [1]. It is an essential area in the Northern and Southern Hemispheres that interacts through the cross-equatorial flow. Asian-Australian monsoon activity is

Data
Data products generated from five S2S models are used in this study, namely the China Meteorological Administration (CMA), ECMWF, Environment and Climate Change Canada (ECCC), the National Centers for Environmental Prediction (NCEP), and the Met Office (UKMO). Boreal summer is defined as May-June-July-August-September (MJJAS). Noting that the S2S database was not produced following a consistent protocol, basic information on the 5 models is shown in Table 1. The analysis period of the five model outputs is unified to 1999-2010 MJJAS. Each month consists of four start dates; the start dates of May are shown in Figure 1. The observational daily precipitation data were provided by the Global Precipitation Climatology Project (GPCP) [55] with a horizontal resolution of 1 • × 1 • latitude/longitude. As part of the WCRP, the GPCP aims to observe and estimate global precipitation. It is based on more than 6000 routine surface observation stations and integrates the satellite observational results. For the purpose of comparing the results of models and observation in the MC, model and observational data were consistently interpolated onto the 2.5 • × 2.5 • grid by using the bilinear interpolation. In order to examine the relationship between sub-seasonal precipitation and ISO in the MC, the Real-time Multivariate MJO (RMM) index [56,57] and the BSISO index [58] were also used. The MC domain was taken as 90 • E-150 • E, 10 • S-20 • N.

Data
Data products generated from five S2S models are used in this study, namely the China Meteorological Administration (CMA), ECMWF, Environment and Climate Change Canada (ECCC), the National Centers for Environmental Prediction (NCEP), and the Met Office (UKMO). Boreal summer is defined as May-June-July-August-September (MJJAS). Noting that the S2S database was not produced following a consistent protocol, basic information on the 5 models is shown in Table 1. The analysis period of the five model outputs is unified to 1999-2010 MJJAS. Each month consists of four start dates; the start dates of May are shown in Figure 1. The observational daily precipitation data were provided by the Global Precipitation Climatology Project (GPCP) [55] with a horizontal resolution of 1° × 1° latitude/longitude. As part of the WCRP, the GPCP aims to observe and estimate global precipitation. It is based on more than 6000 routine surface observation stations and integrates the satellite observational results. For the purpose of comparing the results of models and observation in the MC, model and observational data were consistently interpolated onto the 2.5° × 2.5° grid by using the bilinear interpolation. In order to examine the relationship between subseasonal precipitation and ISO in the MC, the Real-time Multivariate MJO (RMM) index [56,57] and the BSISO index [58] were also used. The MC domain was taken as 90° E-150° E, 10° S-20° N.

Construction of MME
As can be seen from Table 1 and Figure 1, it is obvious that the five S2S models do not follow an agreed-upon protocol; there is also no standard construction of the MME due to the start dates of each model being different. Thus, the construction of an S2S-MME is quite necessary but a challenging issue. This study tries to make such an S2S-MME possible by constructing a median MME

Construction of MME
As can be seen from Table 1 and Figure 1, it is obvious that the five S2S models do not follow an agreed-upon protocol; there is also no standard construction of the MME due to the start dates of each model being different. Thus, the construction of an S2S-MME is quite necessary but a challenging issue. This study tries to make such an S2S-MME possible by constructing a median MME instead of traditional arithmetic MME (e.g., Ren et al. [47]), also referring to the work of Specq et al. [54]. We think that median MME is a better method in terms of different S2S models. The construction of the MME refers to the start date of the CMA. Table 2 is an example of constructing MME with three different start dates (30 April, 1 May, 5 May) for the forecast of the target week (6)(7)(8)(9)(10)(11)(12). In this example, the CMA's lead-a-week forecast for 6 to 12 May can choose 1 May as the start date. For the ECCC, the nearest start date is 5 May. Therefore, a given forecast target week may correspond to different start dates for different models. The number of ensemble members is not exactly the same for each model (CMA, ECCC, and NCEP each have 4 members, ECMWF has 11 members, and UKMO has 3 members); therefore, a uniform number of members was selected to construct MME. In this study, we considered constructing MMEs of 15, 10, and 5 members (including 3, 2, and 1 from each model), respectively. The median performance ensemble of each model was selected to construct the MME. For example, CMA has four ensemble members (a, b, c, d), and thus we can get four ensembles (abc, abd, acd, bcd) constructed by three members. We chose the median of four ensembles as the ensemble of CMA, and the MME was composed of the respective median ensemble of five models.

Forecasting Skill Metrics
In this paper, WMO-recommended standards are used to evaluate prediction performance [59]. The Temporal Correlation Coefficient (TCC) is used to measure the prediction skill. TCC can represent the temporal agreement of model forecast statistical significance and is able to acquire a distribution of prediction skill by calculating the TCC of each grid point. The calculation formation is given as: where i denotes the start date and j denotes the target week. x i,j represents the predicted week j mean anomalies starting on date i, and x i represents the average of predicted mean anomalies starting on date i of each week. y i expresses the observation anomalies in week j, and y expresses the average of y j . N is the number of time samples. The range of TCC is between −1 and 1, and the closer the TCC is to 1, the higher the forecasting skill. Unlike TCC, the Pattern Correlation Coefficient (PCC) reflects the spatial agreement of the model forecast and acquires a variation of prediction skill by calculating PCC of different times. The calculation formula is as follows: where j denotes the target week and i denotes the grid point. ∆x i,j and ∆y i,j represent the predicted and observational week j mean anomalies on grid i, respectively. M is the total number of the grid point over the MC. The PCC also ranges from −1 to 1. A larger PCC indicates a more consistent spatial pattern between model forecast and observation.

Prediction Skills of Individual Models and MME
The TCC skills of the five individual models and the MME are shown in Figure 2. The ensemble member of each model is unified for three members. The result shows that the TCC skill of each model decreases with the increase in lead time. During lead week 1, TCC of most of the MC regions is greater than 0.4. After lead week 1, prediction skill rapidly drops. The prediction skill of the MME is significantly larger than the single model, and most regions can still reach above 0.3 in lead week 2.
Compared with the single model, the MME still shows significant skills in lead week 3, and all models lose their prediction ability in lead week 4. The performance of the ECMWF and NCEP is relatively better than the other three models. Furthermore, the results show that prediction skill by the Equator tends to be higher than that north of the Equator.
Atmosphere 2020, 11, x FOR PEER REVIEW 5 of 17 number of the grid point over the MC. The PCC also ranges from −1 to 1. A larger PCC indicates a more consistent spatial pattern between model forecast and observation.

Prediction Skills of Individual Models and MME
The TCC skills of the five individual models and the MME are shown in Figure 2. The ensemble member of each model is unified for three members. The result shows that the TCC skill of each model decreases with the increase in lead time. During lead week 1, TCC of most of the MC regions is greater than 0.4. After lead week 1, prediction skill rapidly drops. The prediction skill of the MME is significantly larger than the single model, and most regions can still reach above 0.3 in lead week 2.
Compared with the single model, the MME still shows significant skills in lead week 3, and all models lose their prediction ability in lead week 4. The performance of the ECMWF and NCEP is relatively better than the other three models. Furthermore, the results show that prediction skill by the Equator tends to be higher than that north of the Equator. The prediction skill of the MC is significantly higher than those of the middle and high latitudes [60]. As shown in the MME, high-skill regions are concentrated near the Equator over ocean; Sumatra, Kalimantan and the Philippines have higher skill over land. There is no significant difference through analyzing the spatial average of the TCC skill over the ocean and the land. Interestingly, the skill over ocean is more significant after lead week 3, which indicates that the skill over ocean is higher compared to that over land. Figure 3 shows the interannual variations of the PCC skills for the five individual models and the MME. Similar to TCC, the PCC skill of each model decreases with the increase in lead time, and the prediction skill of the MME is relatively higher than individual models. The PCC skills of all models drop significantly in lead week 3 but are still positive for most cases, even for lead week 4.
Atmosphere 2020, 11, x FOR PEER REVIEW 7 of 17  Figure 4. The skills are relatively high in lead week 1 and rapidly drop in lead week 2. The ECMWF has the best performance, with the NCEP and UKMO very close behind, followed by the CMA and ECCC models. MME can effectively improve the forecast skill and is still significant in lead week 3, although the skills of all models are not significant. The forecast skill after 3 weeks decreases slowly and is not statistically significant as the lead week increases. This indicates that the MME lost its prediction skill after three weeks. In  Figure 4. The skills are relatively high in lead week 1 and rapidly drop in lead week 2. The ECMWF has the best performance, with the NCEP and UKMO very close behind, followed by the CMA and ECCC models. MME can effectively improve the forecast skill and is still significant in lead week 3, although the skills of all models are not significant. The forecast skill after 3 weeks decreases slowly and is not statistically significant as the lead week increases. This indicates that the MME lost its prediction skill after three weeks. In addition, the time average of the PCC is relatively higher than the spatial average of the TCC, indicating that the model is better at capturing the spatial distribution characteristics of precipitation relative to its temporal evolution.
Atmosphere 2020, 11, x FOR PEER REVIEW 8 of 17 indicating that the model is better at capturing the spatial distribution characteristics of precipitation relative to its temporal evolution. To explore the impact of ensemble members on the prediction skill of the MME, MMEs of 5, 10, and 15 members were constructed. We call this the median MME because the MMEs are constructed with five models, as shown in Figure 5. The prediction skill of the MME relies on ensemble members, with more members denoting higher skill when the number of models is constant, which is reflected by both the TCC and PCC skills. To compare the prediction ability of the MME with that of the individual model under the same number of members, the ECMWF model of 10 members ensemble was also plotted. Although the ECMWF model has excellent performance (consistent with the results in Figures 2-4), the MME can still improve the prediction ability.
In order to explore the impact of the diversity of models on the prediction ability of the MME, we constructed two types of 15-member MMEs for comparison. The ECMWF model with 11 members and the NCEP model with four members were selected to construct a 15-member MME, known as the optimal MME. As shown in Figure 5, impressively, the median MME skill is larger than the optimal MME. The diversity of models significantly contributes to the performance of the MME than the single model with better performance. This shows that the number and the diversity of models with a certain number of ensemble members needed to be taken into consideration when constructing an MME. Some suitable ensemble methods also have a great impact on improving the prediction performance of the MME, which is also worth further exploring. To explore the impact of ensemble members on the prediction skill of the MME, MMEs of 5, 10, and 15 members were constructed. We call this the median MME because the MMEs are constructed with five models, as shown in Figure 5. The prediction skill of the MME relies on ensemble members, with more members denoting higher skill when the number of models is constant, which is reflected by both the TCC and PCC skills. To compare the prediction ability of the MME with that of the individual model under the same number of members, the ECMWF model of 10 members ensemble was also plotted. Although the ECMWF model has excellent performance (consistent with the results in Figures 2-4), the MME can still improve the prediction ability.
In order to explore the impact of the diversity of models on the prediction ability of the MME, we constructed two types of 15-member MMEs for comparison. The ECMWF model with 11 members and the NCEP model with four members were selected to construct a 15-member MME, known as the optimal MME. As shown in Figure 5, impressively, the median MME skill is larger than the optimal MME. The diversity of models significantly contributes to the performance of the MME than the single model with better performance. This shows that the number and the diversity of models with a certain number of ensemble members needed to be taken into consideration when constructing an MME. Some suitable ensemble methods also have a great impact on improving the prediction performance of the MME, which is also worth further exploring.

Association of MME Prediction with MJO and BSISO
ISO plays a crucial role in the prediction of sub-seasonal timescales. Madden and Julian [61,62] defined the tropical atmospheric low-frequency oscillation MJO as the most significant ISO in tropical regions, especially in winter. The MJO convection signal is strongest over the tropical Indian Ocean and Western Pacific. It propagates eastward with a speed of 5 m s -1 , circulating the globe within a period of 30-60 days. The MJO index [56] can visually reflect the position and propagation of MJO according to the two-dimensional spatial phase diagram composed of RMM.
In addition to the eastward propagation, ISO in the boreal summer also has a significant northward propagating component, which has been popularly called BSISO [63]. The BSISO index proposed by Lee et al. [25] include BSISO1 and BSISO2 indices, which describe both the eastward and northward movements of BSISO within a period of 10-60 days. The BSISO1 index describes the typical northward and the eastward propagation within a period of 30 to 60 days.
The PCC skill of precipitation in lead week 1 was scattered against the amplitudes of BSISO1 and MJO to explore the correlation between the prediction skill of the 15-member median MME and the ISO intensity by linear regression. As shown in Figure 6, the prediction skill of the MME is weakly positively correlated with MJO intensity, and the correlation coefficient is around 0.12. Compared with MJO, BSISO1 has a prominent correlation relationship with a correlation coefficient of 0.37. It shows that the amplitude of BSISO1 has a significant influence on weekly precipitation prediction; in other words, BSISO1 provides important sub-seasonal predictability for precipitation in the MC.

Association of MME Prediction with MJO and BSISO
ISO plays a crucial role in the prediction of sub-seasonal timescales. Madden and Julian [61,62] defined the tropical atmospheric low-frequency oscillation MJO as the most significant ISO in tropical regions, especially in winter. The MJO convection signal is strongest over the tropical Indian Ocean and Western Pacific. It propagates eastward with a speed of 5 m s −1 , circulating the globe within a period of 30-60 days. The MJO index [56] can visually reflect the position and propagation of MJO according to the two-dimensional spatial phase diagram composed of RMM.
In addition to the eastward propagation, ISO in the boreal summer also has a significant northward propagating component, which has been popularly called BSISO [63]. The BSISO index proposed by Lee et al. [25] include BSISO1 and BSISO2 indices, which describe both the eastward and northward movements of BSISO within a period of 10-60 days. The BSISO1 index describes the typical northward and the eastward propagation within a period of 30 to 60 days.
The PCC skill of precipitation in lead week 1 was scattered against the amplitudes of BSISO1 and MJO to explore the correlation between the prediction skill of the 15-member median MME and the ISO intensity by linear regression. As shown in Figure 6, the prediction skill of the MME is weakly positively correlated with MJO intensity, and the correlation coefficient is around 0.12. Compared with MJO, BSISO1 has a prominent correlation relationship with a correlation coefficient of 0.37. It shows that the amplitude of BSISO1 has a significant influence on weekly precipitation prediction; in other words, BSISO1 provides important sub-seasonal predictability for precipitation in the MC. In order to further explore the relationship between BSISO1 phase and MME prediction, the prediction skill for different BSISO1 phases is displayed in Figure 7. The PCC skill for each phase is the highest in lead week 1, and skill decreases with the increase in lead week. The skill of phase 1 and phase 5 is relatively high and declines slowly after lead week 2, indicating that there is relatively high predictability when initiated in phase 1 and phase 5. The skill initiated in phase 3 is the lowest. Since phase 1 and phase 5 of BSISO1 have relatively high predictability, the two phases with the prominent prediction skill are as shown in Figures 8 and 9skill of phase 1 is higher). In phase 1 ( Figure  8), during the 1-4 weeks of observation, there is a rain belt between the west of Sumatra and Kalimantan Island, and the dry area north of 10° N shrinks after week 1. There is a trend of eastward movement over time; this trend can be basically simulated in the first two weeks, and there is some simulation capability for the dry regions of the eastern Philippines, but it is difficult to simulate the In order to further explore the relationship between BSISO1 phase and MME prediction, the prediction skill for different BSISO1 phases is displayed in Figure 7. The PCC skill for each phase is the highest in lead week 1, and skill decreases with the increase in lead week. The skill of phase 1 and phase 5 is relatively high and declines slowly after lead week 2, indicating that there is relatively high predictability when initiated in phase 1 and phase 5. The skill initiated in phase 3 is the lowest. In order to further explore the relationship between BSISO1 phase and MME prediction, the prediction skill for different BSISO1 phases is displayed in Figure 7. The PCC skill for each phase is the highest in lead week 1, and skill decreases with the increase in lead week. The skill of phase 1 and phase 5 is relatively high and declines slowly after lead week 2, indicating that there is relatively high predictability when initiated in phase 1 and phase 5. The skill initiated in phase 3 is the lowest. Since phase 1 and phase 5 of BSISO1 have relatively high predictability, the two phases with the prominent prediction skill are as shown in Figures 8 and 9skill of phase 1 is higher). In phase 1 ( Figure  8), during the 1-4 weeks of observation, there is a rain belt between the west of Sumatra and Kalimantan Island, and the dry area north of 10° N shrinks after week 1. There is a trend of eastward movement over time; this trend can be basically simulated in the first two weeks, and there is some simulation capability for the dry regions of the eastern Philippines, but it is difficult to simulate the Since phase 1 and phase 5 of BSISO1 have relatively high predictability, the two phases with the prominent prediction skill are as shown in Figures 8 and 9 skill of phase 1 is higher). In phase 1 (Figure 8), during the 1-4 weeks of observation, there is a rain belt between the west of Sumatra and Kalimantan Island, and the dry area north of 10 • N shrinks after week 1. There is a trend of eastward movement over time; this trend can be basically simulated in the first two weeks, and there is some simulation capability for the dry regions of the eastern Philippines, but it is difficult to simulate the west of the Philippines (Figure 8f,g). Furthermore, models in median MME failed to effectively simulate the propagation and position of convection after week 3.
Atmosphere 2020, 11, x FOR PEER REVIEW 11 of 17 west of the Philippines (Figure 8f,g). Furthermore, models in median MME failed to effectively simulate the propagation and position of convection after week 3. In phase 5 (Figure 9), the convection occurs near the Bay of Bengal to the South China Sea and the northwest Pacific region in the north of New Guinea. There are significant dry regions in Kalimantan Island and west of Sumatra. Convection tends to weaken and propagates toward the northeast with the increase in lead week. However, the simulation ability is weak and lasts for just two weeks. The models in the median MME can capture the convection cells in mainland Southeast Asia, near the South China Sea and east of the Philippines, but, in the Indochina Peninsula, the models In phase 5 (Figure 9), the convection occurs near the Bay of Bengal to the South China Sea and the northwest Pacific region in the north of New Guinea. There are significant dry regions in Kalimantan Island and west of Sumatra. Convection tends to weaken and propagates toward the northeast with the increase in lead week. However, the simulation ability is weak and lasts for just two weeks. The models in the median MME can capture the convection cells in mainland Southeast Asia, near the South China Sea and east of the Philippines, but, in the Indochina Peninsula, the models tend to underestimate precipitation in lead week 1 and overestimate precipitation in lead week 2. Moreover, the capture of the eastward signal lags for models in the median MME (Figure 9c,g).
tend to underestimate precipitation in lead week 1 and overestimate precipitation in lead week 2. Moreover, the capture of the eastward signal lags for models in the median MME (Figure 9c,g).

Summary and Discussions
In this paper, 12-year (1999-2010) reforecasts from five S2S models, including CMA, ECMWF, ECCC, NCEP, and UKMO, are used to study the sub-seasonal predictability of the precipitation of the MC during the boreal summer. Based on the 20 start dates (May-September), a median MME was constructed, and the multi-week prediction of precipitation in the MC was evaluated based on TCC and PCC skill scores. The relationship between prediction skill in this region and the MJO and BSISO is examined.

Summary and Discussions
In this paper, 12-year (1999-2010) reforecasts from five S2S models, including CMA, ECMWF, ECCC, NCEP, and UKMO, are used to study the sub-seasonal predictability of the precipitation of the MC during the boreal summer. Based on the 20 start dates (May-September), a median MME was constructed, and the multi-week prediction of precipitation in the MC was evaluated based on TCC and PCC skill scores. The relationship between prediction skill in this region and the MJO and BSISO is examined.
It is found that there are significant differences between the abilities of the five individual models and their Multi-model ensemble. The prediction skills of all models decrease with the increasing the lead week. The performances of ECMWF and NCEP are relatively better than those of the other models. The prediction skill of the MME is significantly larger than any individual models. According to the spatial distribution of the TCC skill, the prediction skill is higher in low latitudes than that in high latitudes, and the skill over ocean is more significant than that over land after lead week 3. PCC skills exhibit significantly interannual variations. Both skill measures are relatively high in lead week 1, then rapidly drop in lead week 2. In lead week 3, the skills of all models are not significant except for the MME. Therefore, the models have no useful prediction skill for summer precipitation in the MC after 3 weeks.
The number of ensemble members has a significant impact on the prediction skill of the median MME, with more members denoting higher skill when the number of models is constant. When ensemble members are constant, median MME can improve the prediction ability compared with the best-performing model. It is obvious in both that the improvement in prediction skill comes from the error elimination of the individual model. Furthermore, the diversity of models significantly contributes to the performance of the median MME compared to the individual model with better performance, so the diversity of models should be given priority to construct the MME, and there is no doubt that the importance of both is worth exploring further.
The intensity of ISO has a significant influence on weekly precipitation prediction; this characteristic is especially true with the BSISO. Precipitation prediction skills for different BSISO1 phases are also quite different, and skill decreases with the increase in lead week. There is relatively high predictability in phase 1 and phase 5. In phase 1, a trend of eastward movement can be basically simulated in the first two weeks, but models in the median MME failed to effectively simulate the propagation and position of convection after week 3. In phase 5, models in the median MME can basically simulate the convection cells in mainland Southeast Asia, near the South China Sea and east of the Philippines, within 2 weeks, based on a composite analysis for 1-4 weeks of precipitation anomalies. In the Indochina Peninsula, the models tend to underestimate precipitation anomalies in lead week 1 and overestimate precipitation anomalies in lead week 2. Moreover, the capture of eastward the signal lags for models in the median MME.
Although we have demonstrated that the MME is an effective way to improve the prediction skill of sub-seasonal precipitation in the MC, a considerable amount of further research is warranted. The construction of the MME requires the participation of more S2S models. The proportion of excellent models and the diversity of models require further research when constructing the MME.
The key physical processes [64] and initial value problems [65] of the S2S models still need in-depth diagnosis and analysis. Considering the interaction between sub-seasonal scale and other scale signals is also an important direction to improve short-term climate prediction [66]. Furthermore, applying dynamic-statistical prediction methods [67][68][69][70][71] on the base of the MME is necessary for the improvement of S2S prediction.

Conflicts of Interest:
The authors declare no conflict of interest.