Deterministic and Probabilistic Evaluation of Sub-Seasonal Precipitation Forecasts at Various Spatiotemporal Scales over China during the Boreal Summer Monsoon

: Skillful sub-seasonal precipitation forecasts can provide valuable information for both ﬂood and drought disaster mitigations. This study evaluates both deterministic and probabilistic sub-seasonal precipitation forecasts of ECMWF, ECCC, and UKMO models derived from the Subseasonal to Seasonal (S2S) Database at various spatiotemporal scales over China during the boreal summer monsoon. The Multi-Source Weighted-Ensemble Precipitation, version 2 (MSWEP V2), is used as the reference dataset to evaluate the forecast skills of the models. The results suggest that skillful deterministic sub-seasonal precipitation forecasts are found when the lead time is within 2 weeks. The deterministic forecast skills reduce quickly when the lead time is beyond 2 weeks. Positive ranked probability skill scores (RPSS) are only found when the lead time is within 2 weeks for probabilistic forecasts as well. Multimodel ensembling helps to improve forecast skills by removing large negative skill scores in northwestern China. The forecast skills are also improved at larger spatial scales or longer temporal scales. However, the improvement is only observed for certain regions where the predictable low frequency signals remain at longer lead times. The composite analysis suggests that both the El Niño–Southern Oscillation (ENSO) and Madden–Julian Oscillation (MJO) have an impact on weekly precipitation variability over China. The forecast skills are found to be enhanced during active ENSO and MJO phases. In particular, the forecast skills are found to be enhanced during active MJO phases.


Introduction
Skillful sub-seasonal precipitation forecasts (between 2 weeks and 3 months) can provide valuable information for applications such as flood and drought mitigations [1][2][3]. However, precipitation forecasts at such a time scale remain challenging. Compared to short to medium range forecasts, the memory of atmospheric initial conditions is lost for sub-seasonal forecasts. On the other hand, the slowly varying boundary conditions do not have a substantial impact on sub-seasonal forecasts, as the time scale is too short [4,5].
A growing number of studies have investigated the role played by possible sources of sub-seasonal predictability. The Madden-Julian Oscillation (MJO) is one of the leading potential sources of sub-seasonal predictability [6,7]. Other processes in the climate system, such as stratosphere-troposphere interactions [8,9], soil moisture conditions [10,11], snow cover conditions [12,13], and ocean conditions [14,15], are also investigated.
With a better understanding of sub-seasonal predictability and the improvement of Global Climate Models (GCMs), the sub-seasonal forecast skills have been improved in recent years [5]. Vitart, Ardilouze, Bonet, Brookshaw, Chen, Codorean, Déqué, Ferranti, Fucile and Fuentes [1] found that the GCMs were able to predict the occurrence of a strong MJO event in March 2015 more than 2 weeks in advance. Vitart and Robertson [2] found shifts. The EASM then moves back to South China again in September and makes way for the winter monsoon in October [34,35]. Extreme flood and drought disasters are always caused by heavy or limited rainfalls during the boreal summer monsoon from May to October. Accurate sub-seasonal precipitation forecasts during the boreal summer monsoon over China can provide valuable information for flood and drought disaster prevention. In our previous study, we only used the Bayesian Joint Probability (BJP) approach to calibrate the ECMWF forecasts derived from the S2S Database [36]. However, we should note that the model systems are of great diversity in the S2S Database. It is of great importance to make intercomparisons between different GCMs. Meanwhile, it is also important to assess the benefits of the MME approach on sub-seasonal precipitation forecasts over China, which has not been studied yet. In addition, we assessed the performance of the BJP-calibrated forecasts using the CRPS skill score, which provided an overall evaluation of ensemble forecasts, whereas the probability forecasts of multi category events had not been studied yet. In this study, we first evaluate deterministic sub-seasonal precipitation forecast skills from both single GCM model outputs and a multimodel ensemble mean of GCMs from May to October over China. An extended logical regression model is then built to evaluate the probabilistic forecast skills of multiple category events at various spatiotemporal scales. The remainder of the paper is structured as follows. The GCM models, the observed data, the evaluation metrics, and the sources of sub-seasonal predictability are introduced in Section 2. Section 3 presents the results of both deterministic and probabilistic sub-seasonal precipitation forecasts at various spatiotemporal scales. The impact of El Niño-Southern Oscillation (ENSO) and MJO on sub-seasonal predictability is also shown in Section 3. We discuss the results in Section 4, and the conclusions are drawn in Section 5.

GCM Models and Reference Dataset
The World Weather Research Program (WWRP) and the World Climate Research Program (WCRP) launched the Sub-seasonal to Seasonal (S2S) Prediction Project [3,37]. An extensive S2S database of up-to-60-day forecasts produced by Global Climate Models (GCMs) has been developed for both near real-time predictions and hindcasts (reforecasts) provided by 11 operation or research centers [1]. The dataset is now archived in data servers at the European Centre for Medium-Range Weather Forecasts (ECMWF; http: //apps.ecmwf.int/datasets/data/s2s/, last accessed on 30 January 2021) and the China Meteorological Administration (CMA; http://s2s.cma.cn/, last accessed on 30 January 2021). In this study, we evaluate sub-seasonal precipitation forecasts for the ECMWF model, the Environment and Climate Change Canada (ECCC) model, and the United Kingdom's Met Office (UKMO) model retrieved from the S2S Database (Table 1). As all the three models have an on-the-fly production cycle, we use hindcasts corresponding to model versions in the year 2020. An accurate and reliable precipitation dataset is also crucial for the assessment of precipitation forecasts. Many datasets are developed by merging precipitation estimates from gauges, satellites, and numerical models [38,39]. In this study, the Multi-Source Weighted-Ensemble Precipitation, version 2 (MSWEP V2), dataset is used to evaluate the forecast skill of the models. The MSWEP V2 dataset spans from 1979 to 2017 with high Atmosphere 2021, 12, 1049 4 of 24 spatial (0.1 • ) and temporal (3-h) resolution. Compared to other gridded datasets, the MSWEP V2 exhibits more realistic spatial patterns and higher accuracy over land [40][41][42].

Evaluation Strategy and Skill Metrics
The common evaluation period for the three selected models is 2000-2017, constrained by both the hindcasts and observation availability. We should also note that the three selected models do not have same hindcast frequency and start dates (Table 1). To have a fair comparison, we select hindcasts from ECMWF model that have the same start dates as the ECCC and the UKMO model. A multi-model ensembling (MME) evaluation is also performed by averaging the ECMWF model, ECCC model, and the UKMO model forecasts with the same start dates. The ECMWF, ECCC, UKMO, and MME daily forecasts are then aggregated to weekly and fortnight temporal scales by rolling 7-day and 14-day window averages to all lead times. The week 1, week 2, week 3, and week 4 precipitation forecasts are derived from 7-day rolling window averages with a lead time of 0 days, 7 days, 14 days, and 21 days, respectively, while the week 1-2 and week 3-4 precipitation forecasts are derived from 14-day rolling window averages with a lead time of 0 days and 14 days. Regional precipitation forecasts are calculated by averaging forecasts within each hydroclimatic region shown in Figure 1. Figure 2 presents the mean and the coefficient of variation of daily precipitation over China during the boreal summer monsoon. The precipitation amount is higher in southeastern China due to the impact of the East Asia Summer Monsoon (EASM). In comparison, limited precipitation is observed in northwestern China. However, variability of daily precipitation in these dry regions is highest compared to other regions. from gauges, satellites, and numerical models [38,39]. In this study, the Multi-Source Weighted-Ensemble Precipitation, version 2 (MSWEP V2), dataset is used to evaluate the forecast skill of the models. The MSWEP V2 dataset spans from 1979 to 2017 with high spatial (0.1°) and temporal (3-h) resolution. Compared to other gridded datasets, the MSWEP V2 exhibits more realistic spatial patterns and higher accuracy over land [40][41][42].

Evaluation Strategy and Skill Metrics
The common evaluation period for the three selected models is 2000-2017, constrained by both the hindcasts and observation availability. We should also note that the three selected models do not have same hindcast frequency and start dates (Table 1). To have a fair comparison, we select hindcasts from ECMWF model that have the same start dates as the ECCC and the UKMO model. A multi-model ensembling (MME) evaluation is also performed by averaging the ECMWF model, ECCC model, and the UKMO model forecasts with the same start dates. The ECMWF, ECCC, UKMO, and MME daily forecasts are then aggregated to weekly and fortnight temporal scales by rolling 7day and 14-day window averages to all lead times. The week 1, week 2, week 3, and week 4 precipitation forecasts are derived from 7-day rolling window averages with a lead time of 0 days, 7 days, 14 days, and 21 days, respectively, while the week 1-2 and week 3-4 precipitation forecasts are derived from 14-day rolling window averages with a lead time of 0 days and 14 days. Regional precipitation forecasts are calculated by averaging forecasts within each hydroclimatic region shown in Figure 1. Figure 2 presents the mean and the coefficient of variation of daily precipitation over China during the boreal summer monsoon. The precipitation amount is higher in southeastern China due to the impact of the East Asia Summer Monsoon (EASM). In comparison, limited precipitation is observed in northwestern China. However, variability of daily precipitation in these dry regions is highest compared to other regions.

Deterministic Metrics
The deterministic forecast skills are then evaluated using a leave-one-year-out approach, in which the reference climatology is calculated over the period excluding the target year to be verified. Consider, for example, evaluating daily precipitation forecasts initialized on 4 May 2000. The climatology is determined using all forecasts initialized on 4 May 2001-2017. To take the ensemble size into consideration, we analyze the same verification metrices using only one control and three perturbed ensemble members for all three models. The three perturbed ensemble members are selected from the combinations from the remaining ensemble numbers for each GCM model. Thus, there is a total number of = 120 combinations for the ECMWF model and = 20 combinations for the UKMO model. The deterministic and probabilistic forecast skills are then calculated by averaging the skill scores for all combinations. The forecast anomaly for this case is then derived by subtracting the cross-validated climatological mean. The observed anomalies are calculated in the same way.
After that, the mean squared skill score (MSSS) is given by where is the ensemble mean of anomaly of sub-seasonal precipitation forecasts for case , = 1,2, ⋯ , ; is the corresponding observed anomaly; and is the average of observed anomalies for all cases. The MSSS compares the mean square error of the GCM forecasts to the climatology forecasts and can be expanded as where is the correlation coefficient between forecast anomalies and observed anomalies; and are the standard deviation of forecast anomalies and observed anomalies, respectively; and is the mean value of forecast anomalies for all cases [43].

Probabilistic Metrics
It is difficult to evaluate probabilistic forecasts of the selected models directly, as the

Deterministic Metrics
The deterministic forecast skills are then evaluated using a leave-one-year-out approach, in which the reference climatology is calculated over the period excluding the target year to be verified. Consider, for example, evaluating daily precipitation forecasts initialized on 4 May 2000. The climatology is determined using all forecasts initialized on 4 May 2001-2017. To take the ensemble size into consideration, we analyze the same verification metrices using only one control and three perturbed ensemble members for all three models. The three perturbed ensemble members are selected from the combinations from the remaining ensemble numbers for each GCM model. Thus, there is a total number of C 3 10 = 120 combinations for the ECMWF model and C 3 6 = 20 combinations for the UKMO model. The deterministic and probabilistic forecast skills are then calculated by averaging the skill scores for all combinations. The forecast anomaly for this case is then derived by subtracting the cross-validated climatological mean. The observed anomalies are calculated in the same way.
After that, the mean squared skill score (MSSS) is given by where H t is the ensemble mean of anomaly of sub-seasonal precipitation forecasts for case t, t = 1, 2, · · · , T; O t is the corresponding observed anomaly; and O is the average of observed anomalies for all cases. The MSSS compares the mean square error of the GCM forecasts to the climatology forecasts and can be expanded as

Probabilistic Metrics
It is difficult to evaluate probabilistic forecasts of the selected models directly, as the ensemble sizes are too small to generate probability accurately [44]. In this study, the extended logistic regression (ELR) is used to calculate the probabilities for tercilebased events: where p is the probability not exceeding the quantile q; H is the ensemble mean of subseasonal precipitation forecasts; and θ = {θ 0 , θ 1 , θ 2 } are parameters to be estimated. It has been proved that the extended logistic regression can yield logically constant sets of forecasts [26,31]. In this study, the ELR model is built for daily, weekly, and fortnight forecasts at both grid scales and regional scales following a leave-one-year-out approach as well. In the case of deterministic evaluation, the 33rd and 67rd percentiles of the observations are defined as the quantiles over the period of 2001-2017. Forecasts and observations, which are initialized on 4 May during the period of 2001 to 2017, are pooled together to make a parameter reference for the ELR model. The ranked probability skill score (RPSS) is used to evaluate the forecast skills. The RPSS is defined as where p m,t is the forecast probability assigned to the kth category, c m,t is the climatological probability assigned to the kth category, and o m,t is one when the observation falls into the kth category and zero otherwise. The RPSS ranges from −∞ to 100%. A higher RPSS value indicates higher accuracy. When the CRPSS is 0%, the probabilistic forecasts show no improvement over the cross-validated climatological forecasts. The attribute diagram is used to evaluate the reliability, resolution, and sharpness of the ELR-based tercile category probabilistic forecasts. The attribute diagram shows the observed frequencies against its forecast probabilities [45]. In this study, the three class events of below-, near-, and above normal are first defined by equally dividing the cross-validated climatology into terciles. The forecast probability is divided into 5 equalwidth groups, which are The corresponding observed relative frequency is plotted against the mean forecast probability for each group. The forecasts are reliable if the scatters are along the 45-degree diagonal. The sharpness is also shown on the attribute diagram. The size of dot indicates the fraction of forecasts in each group. The forecasts are sharp if the fraction of forecasts tends to be either very high (e.g., >90%) or very low (e.g., <10%) [46].

Sources of Sub-Seasonal Predictability
To diagnose the impact of ENSO and MJO on weekly precipitation variability over China, composite analysis of precipitation anomalies at different ENSO and MJO phases is conducted in this study.
The phase of ENSO event is measured by the weekly Ni no-3.4 index defined as the SST anomalies averaged across a given region (5 • N-5 • S, 170 • W-120 • W). The Optimum Interpolation SST, version 2.1 (OISST.v2.1), is used here to derive the weekly Ni no-3.4 index for the period of 2000-2017. The weekly anomalies are calculated and standardized using a leave-one-year-out cross-validation approach. An El Niño (La Niña) event is defined if a five consecutive 3-month running mean of SST anomalies is above (below) the threshold of +0.5 • C (−0.5 • C).
The phase of MJO event is measured by the Real-time Multivariate MJO (RMM) index components (RMM1 and RMM2). In this study, the zonal wind (850 hPa and 200 hPa) is obtained from the ERA5 reanalysis, and the outgoing longwave radiation (OLR) is derived from Climate Data Record Program of NOAA (http://doi.org/10.7289/V5SJ1HH2, accessed on 30 January 2021) for the same period of 2000-2017. The weekly anomalies of zonal wind and OLR are computed using a leave-one-year-out cross-validation approach, and the MJO cycles with The forecast metrics are then compared at different ENSO and MJO phases to investigate the influence of large scale circulations on sub-seasonal precipitation forecasts over China.  Positive MSSS values are found in southeastern, northern, and southwestern China for the MME week 2 forecasts, but with a much lower magnitude compared to week 1. The week 1-2 forecasts are also found to be skillful over these regions, and the MSSS values are higher than week 2 forecasts alone. Little skill can be found when the lead time is beyond 2 weeks. Negative MSSS values are found almost everywhere for week 3, week 4, and week 3-4 precipitation forecasts.

Deterministic Evaluation
The MSSS values of daily, weekly, and fortnight precipitation forecasts at the grid scale are compared in Figure 5 (outliers are not shown for clarity). Nearly 90% of grid cells have negative MSSS values when the lead time is beyond 10 days for daily precipitation forecasts, and the multimodel ensembling has the highest forecast skills compared to the ECMWF, ECCC, and UKMO. The forecast skills are improved after temporal aggregation. For example, the MSSS values of weekly precipitation forecasts range from nearly −0.4 to 0.25 when the lead time is 10 days for multimodel ensembling forecasts, while that of the daily forecasts range from nearly −0.4 to 0.1. However, we should also note that the effect of temporal aggregation is limited at longer lead times. The MSSS values of    The MSSS values of daily, weekly, and fortnight precipitation forecasts at the g scale are compared in Figure 5 (outliers are not shown for clarity). Nearly 90% of grid ce have negative MSSS values when the lead time is beyond 10 days for daily precipitat forecasts, and the multimodel ensembling has the highest forecast skills compared to ECMWF, ECCC, and UKMO. The forecast skills are improved after temporal aggregati For example, the MSSS values of weekly precipitation forecasts range from nearly −0.4 0.25 when the lead time is 10 days for multimodel ensembling forecasts, while that of daily forecasts range from nearly −0.4 to 0.1. However, we should also note that the eff In Figure 6, the MSSS of precipitation forecasts at the regional scale is presented. T results suggest that the forecast skills of precipitation forecasts at the regional scale higher than that of the grid scale, especially at higher temporal aggregation levels. T MSSS values of multimodel ensembling fortnight precipitation forecasts show promisi skills over Region 2 (Inland River in northern Tibet), Region 9 (Upper Yangtze River), a Region 12 (Southwest rivers in Yunnan) at longer lead times. This suggests that spatiotemporal aggregation can help to extract a predictable sub-seasonal signal ov China for certain regions. In Figure 6, the MSSS of precipitation forecasts at the regional scale is presented. The results suggest that the forecast skills of precipitation forecasts at the regional scale are higher than that of the grid scale, especially at higher temporal aggregation levels. The MSSS values of multimodel ensembling fortnight precipitation forecasts show promising skills over Region 2 (Inland River in northern Tibet), Region 9 (Upper Yangtze River), and Region 12 (Southwest rivers in Yunnan) at longer lead times. This suggests that the spatiotemporal aggregation can help to extract a predictable sub-seasonal signal over China for certain regions.
osphere 2021, 12, x FOR PEER REVIEW 11 Figure 6. Mean squared skill score of sub-seasonal precipitation forecasts at different temporal scales (daily, weekly, and fortnight) of the ECMWF model, the ECCC model, the UKMO model, and the MME model for each region during the boreal summer monsoon.

Probabilistic Evaluation
The ranked probability skill scores (RPSS) of daily precipitation forecasts are gi in Figure 7

Probabilistic Evaluation
The ranked probability skill scores (RPSS) of daily precipitation forecasts are given in Figure 7 Figure 9 compares the RPSS of probabilistic precipitation forecasts at differ temporal scales over China. Similar to the results of MSSS, the probabilistic forecasts found to be skilful when the lead time is within 10 days. The multimodel ensembling a helps to improve probabilistic forecast skills. However, we should note that the tempo aggregation may have a limited effect on probabilistic forecast skills. The RPSS of da precipitation forecasts mostly ranges from −15% to 15% when the lead time is beyond days. However, the RPSS values of weekly and fortnight forecasts are always below z  Figure 9 compares the RPSS of probabilistic precipitation forecasts at different temporal scales over China. Similar to the results of MSSS, the probabilistic forecasts are found to be skilful when the lead time is within 10 days. The multimodel ensembling also helps to improve probabilistic forecast skills. However, we should note that the temporal aggregation may have a limited effect on probabilistic forecast skills. The RPSS of daily precipitation forecasts mostly ranges from −15% to 15% when the lead time is beyond 10 days. However, the RPSS values of weekly and fortnight forecasts are always below zero at the same lead time. The RPSS values of probabilistic precipitation forecasts at the regional scale shown in Figure 10. Skilful daily probabilistic precipitation forecasts are found o Region 2 (Inland rivers in Xinjiang), Region 5 (Upper Yellow River), Region 7 (Songh River), Region 9 (Upper Yangtze River), Region 12 (Southwest rivers in Yunnan), a Region 13 (Yangtze River) when the lead time is shorter than 10 days. The RPSS values weekly precipitation forecasts are higher than those of the daily forecasts over Regio (Inland rivers in Xinjiang) and Region 9 (Upper Yangtze River) at the same lead tim The RPSS values of probabilistic precipitation forecasts at the regional scale are shown in Figure 10. Skilful daily probabilistic precipitation forecasts are found over Region 2 (Inland rivers in Xinjiang), Region 5 (Upper Yellow River), Region 7 (Songhua River), Region 9 (Upper Yangtze River), Region 12 (Southwest rivers in Yunnan), and Region 13 (Yangtze River) when the lead time is shorter than 10 days. The RPSS values of weekly precipitation forecasts are higher than those of the daily forecasts over Region 2 (Inland rivers in Xinjiang) and Region 9 (Upper Yangtze River) at the same lead time. However, the RPSS values of weekly forecasts are lower than those of the daily forecasts in other regions. In addition, the RPSS values of fortnight forecasts are lower than both the daily forecasts and weekly forecasts. This suggests that the spatiotemporal aggregation can work both as a benefit and as a disadvantage for probabilistic forecasts. The reliability of probabilistic forecasts at the grid scale is presented in Figures 11  and 12 by pooling all grid points together. The multimodel ensembling shows high reliability and sharpness for below-normal category forecasts, especially when the lead time is within 2 weeks. The probabilistic forecasts show lower reliability for near-normal and above-normal categories. Similar results are also found for the ECMWF, ECCC, and UKMO forecasts at regional scale (not shown). The reliability of probabilistic forecasts at the grid scale is presented in Figures 11 and 12 by pooling all grid points together. The multimodel ensembling shows high reliability and sharpness for below-normal category forecasts, especially when the lead time is within 2 weeks. The probabilistic forecasts show lower reliability for near-normal and abovenormal categories. Similar results are also found for the ECMWF, ECCC, and UKMO forecasts at regional scale (not shown).

The Impact of ENSO and MJO on Sub-Seasonal Predictability
The above results suggest that the sub-seasonal precipitation forecast skills are mostly found in the first week. The forecast skills decrease quickly when the lead time is beyond 1 week. In this section, we explore the impact of ENSO and MJO on sub-seasonal precipitation variability and the forecast skills over China. Figure 13 presents the composites of weekly precipitation anomalies (mm) in each of the ENSO and MJO phases during the boreal summer monsoon. It is clear that the precipitation anomalies change under different ENSO conditions. Compared to ENSO, the MJO has a greater impact on sub-seasonal precipitation variability, especially in southern China. However, the weekly precipitation anomalies experience different precipitation variations over different regions in certain MJO phases. The weekly precipitation is significantly enhanced over Region 9 (Upper Yangtze River), Region 13 (Yangtze River), and Region 16 (Pearl River) during phase 3, and significantly suppressed over Region 11 (Southwest rivers in southern Tibet), Region 15 (Lower Yangtze River), and Region 16 (Pearl River) during phase 5. The weekly precipitation anomalies of Region 12 (Southwest Rivers in Yunnan) and Region 13 (Yangtze River) are significantly suppressed during phase 8. These results show consistent characteristics with Xavier, et al. [47], in which phases 2~4 produced significantly increased precipitation and phases 6~8 produced significantly decreased precipitation over southeast Asia.

The Impact of ENSO and MJO on Sub-Seasonal Predictability
The above results suggest that the sub-seasonal precipitation forecast skills are mostly found in the first week. The forecast skills decrease quickly when the lead time is beyond 1 week. In this section, we explore the impact of ENSO and MJO on sub-seasonal precipitation variability and the forecast skills over China. Figure 13 presents the composites of weekly precipitation anomalies (mm) in each of the ENSO and MJO phases during the boreal summer monsoon. It is clear that the precipitation anomalies change under different ENSO conditions. Compared to ENSO, the MJO has a greater impact on sub-seasonal precipitation variability, especially in southern China. However, the weekly precipitation anomalies experience different precipitation variations over different regions in certain MJO phases. The weekly precipitation is significantly enhanced over Region 9 (Upper Yangtze River), Region 13 (Yangtze River), and Region 16 (Pearl River) during phase 3, and significantly suppressed over Region 11 (Southwest rivers in southern Tibet), Region 15 (Lower Yangtze River), and Region 16 (Pearl River) during phase 5. The weekly precipitation anomalies of Region 12 (Southwest Rivers in Yunnan) and Region 13 (Yangtze River) are significantly suppressed during phase 8. These results show consistent characteristics with Xavier, et al. [47], in which phases 2~4 produced significantly increased precipitation and phases 6~8 produced significantly decreased precipitation over southeast Asia. Figure 13. Composites of weekly precipitation anomalies (mm) in each of the ENSO and MJO phases during the boreal summer monsoon. Statistical significance at the 5% level is labelled. Figure 13. Composites of weekly precipitation anomalies (mm) in each of the ENSO and MJO phases during the boreal summer monsoon. Statistical significance at the 5% level is labelled. Figures 14 and 15 compare the RPSS values of week 1, week 2, week 3, and week 4 precipitation forecasts in each of the ENSO and MJO phases during the boreal summer monsoon. The forecast skills are enhanced over Region 2 (Inland rivers in northern Tibet), Region 9 (Upper Yangtze River), and Region 12 (Southeast rivers in Yunnan) during the El Niño or La Niña phase compared to the neutral phase, especially at longer lead times. The RPSS values of week 4 forecasts are below 0% over most regions during the neutral phase for the ECMWF, ECCC, UKMO, and MME forecasts. In contrast, positive RPSS scores are observed over Region 2 (Inland Rivers in northern Tibet) and Region 3 (Inland Rivers in Inner Mongolia) for the ECCC, UKMO, and MME forecasts and Region 11 (Southwest Rivers in southern Tibet) and Region 12 (Southeast rivers in Yunnan) for the MME forecasts. The enhancement is more pronounced during active MJO phases. The RPSS of week 2 forecasts is lower than 10% over most regions during the weak MJO phase for the ECMWF, ECCC, UKMO, and MME forecasts. In contrast, the RPSS has been greatly improved during active MJO phases, especially in phases 7~8 for the ECCC, UKMO, and MME forecasts. The RPSS values of week 3 and week 4 forecasts are negative over all regions and all models during the weak MJO phase. On the contrary, positive RPSS values are also found over Region 2 (Inland rivers in northern Tibet), Region 9 (Upper Yangtze River), Region 14 (Middle Yangtze River), and Region 15 (Lower Yangtze River) for week 3 and week 4 MME forecasts. The RPSS values of week 4 forecasts are below 0% over most regions during the neutra phase for the ECMWF, ECCC, UKMO, and MME forecasts. In contrast, positive RPS scores are observed over Region 2 (Inland Rivers in northern Tibet) and Region 3 (Inlan Rivers in Inner Mongolia) for the ECCC, UKMO, and MME forecasts and Region 1 (Southwest Rivers in southern Tibet) and Region 12 (Southeast rivers in Yunnan) for th MME forecasts. The enhancement is more pronounced during active MJO phases. Th RPSS of week 2 forecasts is lower than 10% over most regions during the weak MJO phas for the ECMWF, ECCC, UKMO, and MME forecasts. In contrast, the RPSS has been greatl improved during active MJO phases, especially in phases 7~8 for the ECCC, UKMO, an MME forecasts. The RPSS values of week 3 and week 4 forecasts are negative over a regions and all models during the weak MJO phase. On the contrary, positive RPSS value are also found over Region 2 (Inland rivers in northern Tibet), Region 9 (Upper Yangtz River), Region 14 (Middle Yangtze River), and Region 15 (Lower Yangtze River) for wee 3 and week 4 MME forecasts.

Discussion
In this study, we evaluate the sub-seasonal precipitation forecast skills at various spatiotemporal scales over China during the boreal summer monsoon. The results suggest that skilful sub-seasonal precipitation forecasts are only found when the lead time is within 1 week. The forecast skills decrease rapidly when the lead time is beyond 1 week for both deterministic and probabilistic forecasts. Positive skill scores are only found over southeastern and southwestern China. These results show similar characteristics as de Andrade, Coelho and Cavalcanti [4], where the week 3 and week 4 deterministic forecast skills are low in extratropical regions. This is probably due to both large climatic noises and the limited intraseasonal oscillation (ISO) signal of precipitation during the summer monsoon [48,49]. The precipitation amount is higher in southeastern China owing to the impact of the East Asia Summer Monsoon (EASM). The relatively higher forecast skills in these regions may be due to the reasonable prediction of the intraseasonal oscillation of the EASM despite some systematic errors [14]. The relatively lower coefficient of variation of daily precipitation in southwestern China shown in Figure 2 suggests that the predictability of precipitation is higher compared to other regions. Although limited precipitation is observed in northwestern China, the high coefficient of variation suggests that the precipitation is highly non-uniformly distributed during the boreal summer

Discussion
In this study, we evaluate the sub-seasonal precipitation forecast skills at various spatiotemporal scales over China during the boreal summer monsoon. The results suggest that skilful sub-seasonal precipitation forecasts are only found when the lead time is within 1 week. The forecast skills decrease rapidly when the lead time is beyond 1 week for both deterministic and probabilistic forecasts. Positive skill scores are only found over southeastern and southwestern China. These results show similar characteristics as de Andrade, Coelho and Cavalcanti [4], where the week 3 and week 4 deterministic forecast skills are low in extratropical regions. This is probably due to both large climatic noises and the limited intraseasonal oscillation (ISO) signal of precipitation during the summer monsoon [48,49]. The precipitation amount is higher in southeastern China owing to the impact of the East Asia Summer Monsoon (EASM). The relatively higher forecast skills in these regions may be due to the reasonable prediction of the intraseasonal oscillation of the EASM despite some systematic errors [14]. The relatively lower coefficient of variation of daily precipitation in southwestern China shown in Figure 2 suggests that the predictability of precipitation is higher compared to other regions. Although limited precipitation is observed in northwestern China, the high coefficient of variation suggests that the precipitation is highly non-uniformly distributed during the boreal summer monsoon.
Although the GCMs are able to simulate the amount of precipitation in these arid regions, the variability is always highly underestimated in the GCMs. This may partly explain the relatively lower forecast skills in these regions. In comparison, the amount of precipitation over Regions 6, 7, and 8 is higher than northwestern China. Meanwhile, the coefficient of variation is lower in comparison. However, the interactions between tropical monsoon variability and high latitude circulation systems are more difficult to simulate in the GCMs. This is probably the main reason for the lower predictive skills in northeastern China.
We should also note that the ECMWF and UKMO models outperform the ECCC model for sub-seasonal precipitation forecasts. This result indicates that the ECMWF and UKMO models may benefit from being coupled with ocean models [50]. Compared to the ECMWF and ECCC models, the multimodel ensembling helps to greatly reduce the large negative skill scores over Inner Mongolia. Similar results were also found by Vigaud, Robertson and Tippett [31]. However, the number of GCM models used for multimodel ensembling in this study is limited as different GCMs always have different start dates and forecast frequency. When the GCMs are produced in a more harmonized way, the sub-seasonal precipitation forecast skills may be further improved with a larger number of GCMs.
The extended logistic regression (ELR) model is used in this study to generate tercilebased probabilistic forecasts. The results suggest that the ELR model can produce skillful probabilistic forecasts when the lead time is within 1 week. Meanwhile, the ELR-based probabilistic forecasts show high reliability for below-normal category forecasts. However, the parametric uncertainty of the ELR model is not considered in this study. In the future, several Bayes' theorem-based post-processing methods could be applied to take this into consideration. For example, the Bayesian Joint Probability (BJP) method has been used to generate skilful and reliable precipitation forecasts from GCMs [33,51]. Meanwhile, the prediction of extreme weather or climate events at sub-seasonal time scales should also be considered. Compared to tercile-based categorical events, extreme weather or climate events are rarer and usually have significant socioeconomic impacts [52,53]. Lavaysse, et al. [54] suggested that 40% of the meteorological droughts could be detected 1 month ahead by using GCM sub-seasonal precipitation forecasts. However, more work is needed to estimate the sub-seasonal predictability of other types of extreme events, such as tropical cyclones, flooding, and tornadoes [55].
The sub-seasonal precipitation forecast skills are improved at larger spatial scales or longer temporal scales when the lead time is within 1 week. However, spatiotemporal aggregation has a limited effect on forecast skills at longer lead times. The forecast skills increase only in regions where the predictable low frequency signals remain. This is consistent with the findings of van Straaten, Whan, Coumou, van den Hurk and Schmeits [27], which suggests that the spatiotemporal aggregation should be limited in certain cases.
We also analyze the impact of ENSO and MJO on weekly precipitation variability and forecast skills over China. The results suggest that the sub-seasonal precipitation forecast skills are improved during active ENSO or MJO phases. However, the improvement varies at different ENSO and MJO phases. Although the MJO significantly reduced the weekly precipitation over Region 15 (Lower Yangtze River) and Region 16 (Pearl River) during phase 5, the forecast skills of weekly precipitation were not improved significantly. This suggests that the tropical-extratropical interactions are not well simulated under such conditions. Meanwhile, the impact of other large-scale circulations, such as Arctic Oscillation, North Atlantic Oscillation (NAO), and Pacific-North American (PNA), should also be considered. Wang and Robertson [56] suggested that the seasonal variability of the Arctic Oscillation (AO) contributes to higher skills in week 3-4 precipitation forecasts.

Conclusions
Sub-seasonal precipitation forecasts during the boreal summer monsoon season are valuable for both flood and drought disaster mitigations over China. In this study, we evaluate both deterministic and probabilistic sub-seasonal precipitation forecast skills The sub-seasonal daily precipitation forecasts are skillful and reliable when the lead time is within 1 week for both deterministic and probabilistic forecasts. The forecast skills decrease rapidly when the lead time is beyond 1 week. Positive skill scores are only found over southeastern and southwestern China. The multimodel ensembling helps to improve deterministic forecast skills. Large negative skill scores are removed when the multimodel ensembling strategy is used, especially over northwestern China. The forecast skills are also improved at larger spatial scales or longer temporal scales. However, the improvement is only observed for certain regions when the lead time is within 10-14 days. When the lead time is beyond 2 weeks, the spatiotemporal aggregation has a limited effect on forecast skills.
The composite analysis of weekly precipitation anomalies suggests that both the ENSO and MJO have an impact on precipitation variability over China. However, the influence of ENSO and MJO varies at different phases. The weekly precipitation is significantly enhanced over southeastern China during MJO phases 1~3, while the precipitation is suppressed during MJO phases 4~5. The forecast skills are found to be enhanced during active ENSO and MJO phases, and the enhancement is more pronounced during active MJO phases. However, the enhancement is not always consistent with the above composite analysis. This suggests that the tropical-extratropical interactions are not well simulated under such conditions.
Although a clear benefit of multimodel ensembling is observed for sub-seasonal precipitation forecasts, the number of GCMs used in this study is limited as these models have different start dates and forecast frequency. The sub-seasonal precipitation forecast skills could be further improved with a larger number of GCMs when the models are produced in a more harmonized way in the future. In addition, the combined effect of ENSO and MJO on sub-seasonal precipitation forecasts has not been considered yet. A more detailed assessment should be conducted to improve the understandings of sub-seasonal predictability in the future.