Post-Processing and Evaluation of Precipitation Ensemble Forecast under Multiple Schemes in Beijiang River Basin

Precipitation is one of the most important factors affecting the accuracy and uncertainty of hydrological forecasting. Considerable progress has been made in numerical weather prediction after decades of development, but the forecast products still cannot be used directly for hydrological forecasting. This study used ensemble pro-processor (EPP) to post-process the Global Ensemble Forecast System (GEFS) and Climate Forecast System version 2 (CFSv2) with four designed schemes, and then integrated them to investigate the forecast accuracy in longer time scales based on the best scheme. Many indices such as correlation coefficient, Nash efficiency coefficient, rank histogram, and continuous ranked probability skill score were used to evaluate the results in different aspects. The results show that EPP can improve the accuracy of raw forecast significantly, and the scheme considering cumulative forecast precipitation is better than that only considers single-day forecast. Moreover, the scheme that considers some observed precipitation would help to improve the accuracy and reduce the uncertainty. In terms of mediumand long-term forecasts, the integrated forecast based on GEFS and CFSv2 after post-processed would be better than CFSv2 significantly. The results of this study would be a very important demonstration to remove the deviation of ensemble forecast and improve the accuracy of hydrological forecasting in different time scales.


Introduction
Flood is a hazard with potentially serious consequences of loss of life and economic costs which occur more frequently with the impact of climate change and human activities [1,2]. Hydrological forecasting has been proven to be an effective non-engineering measure to resist flood disasters and reduce losses; its accuracy depends on the input precipitation data [3]. Therefore, improving the accuracy of precipitation forecast is an effective way to improve the accuracy of hydrological forecast. Many studies have shown that numerical weather prediction (NMP) and climate models can improve the skills of precipitation forecast significantly [4][5][6][7][8].
With the rapid development of computer technology and meteorological science, the meteorological ensemble prediction is superior to the traditional one in terms of forecast accuracy and period, and it has been widely accepted by many national meteorological departments [9]. Moreover, The Canadian Meteorological Centre (CMC), the European Centre for Medium-Range Weather Forecasts (ECMWF), the National Centers for Environmental Prediction (NCEP), and the China Meteorological Administration (CMA) have all realized the operation of numerical ensemble prediction [10]. At present, the Global Ensemble Forecast System (GEFS) [11], Climate Forecast System (CFS) [12], and THORPEX (The Observing System Research and Predictability Experiment) Interactive Grand Global Ensemble (TIGGE) [13] are the most representative products in the world. In addition, GEFS and CFS both have more than 30 years of meteorological ensemble database, which are operated by the NCEP and have developed to the second version of reanalysis and forecast system. Aiming to accelerate the improvements in the accuracy of one-day to two-week high-impact weather forecasts, the World Meteorological Organization launched a 10-year plan (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015) of THORPEX and established the TIGGE database, which consists of ensemble forecast data from 10 global NWP centers, starting from October 2006 [13]. In addition, TIGGE can satisfy users' various information needs such as the development of observation systems, forecasting systems, and data assimilation. Scholars have done a lot of research based on meteorological ensemble forecast. Liu et al. [14] evaluated the GFS forecast precipitation after ensemble pre-processing based on the measured data of meteorological stations in the Huaihe River, and found that the accuracy of precipitation in the early stage is higher than that in the later stage, while the accuracy of accumulated precipitation is higher than that of single day.
Although the meteorological ensemble forecast has been greatly improved in terms of accuracy and forecast period, it cannot be used to drive the hydrological model directly and needs to be processed for spatial downscaling and deviation correction. The core idea of the processing method is to establish a statistical relationship between the meteorological elements derived from the weather model and the corresponding measured hydrological elements for analysis. For example, the earliest Perfect Prognosis, as proposed by Klein et al. [15], directly applied the statistical relationship between different observed meteorological elements to the output by the model. The Model Output Statistics (MOS) proposed by Glahn et al. [16] uses multiple regression to establish the statistical relationship between observations and model outputs. Rank histograms [17], Bayesian processor method [18], quantile mapping method [19], and ensemble pre-processor [20,21] have all greatly promoted the development and application of NWP. This kind of methods have a certain good performance when dealing with most meteorological elements, but not with precipitation, due to precipitation may not happen at some time, which is difficult to simulate the statistical fitting distribution.
Quantile mapping is a relatively simple empirical method that connects the cumulative probability distribution of observed elements with that of forecast ones, and has been widely applied. For example, Verkade et al. [22] used it to correct the deviation of ECMWF precipitation and temperature forecast data, Zhu et al. [23] used it to correct the deviation of precipitation forecast in GEFS, and Hashino et al. [24] and Wood et al. [25] applied it to the runoff forecast deviation correction. They found that this method has a good effect of deviation correction. However, it does not have stability and reliability due to the limitations of the method itself [26,27].
The main idea of Bayesian method is to establish statistical relationship between historical observation and ensemble forecast, and then obtain conditional probability distribution function of observation corresponding to ensemble forecast [28]. For the non-normally distributed variables, Bayesian first performs the normal quantity transform (NQT), and then obtains the possible original field of observation through the inverse function of NQT after calculating the conditional probability of the observation. Krzysztofowicz et al. [29] verified that the result of Bayesian method is better than that of the traditional statistical model.
The Bayesian method has effectively promoted the research on post-processing methods. Schaake et al. [20] proposed the ensemble pre-processor (EPP) to turn the meteorological ensemble forecast by fitting the probability distribution function to generate ensemble members. Finally, the ensemble average of forecast members is taken as the processing result of numerical weather forecast based on measured data. Moreover, considering the spatiotemporal continuity of ensemble members, Clark et al. [30] and Schaake et al. [20] proposed the "Schaake shuffle" to make the ensemble members have continuous time information, thereby reducing forecast errors. This method is widely used in the world. Liu et al. [31] corrected the deviation of the 15-member daily precipitation forecast from NCEP GFS data in China's Huai River basin by Schaake method and tested them by the observed data of 167 meteorological stations. Tao et al. [32] used this method to compare the accuracy of five ensemble forecast data sources of TIGGE (CMA, ECMWF, JMA, NCEP, and UKMO) before and after correction and found that the EPP improved the accuracy of the five data sources.
The purpose of this study was to post-process the ensemble forecast by EPP, investigate the better schemes to improve the accuracy of precipitation forecast from GEFS and CFSv2 based on observed data in the Beijiang River Basin (BJR) and explore a way to improve the accuracy of long-term forecasts. This paper is organized as follows. The study area and the main methods used in this study are described in Sections 2 and 3. The main results and a discussion on the variability of dryness/wetness are presented in Section 4. The conclusions are provided in Section 5.

Study Area
Beijiang River is the secondary tributary of the Pearl River, with an average annual runoff of 42.7 billion m 3 . The BJR (23 • 30 -25 • 42 N, 112 • 6 -114 • 42 E) is located in the lower reaches of the Pearl River Basin, north of Guangdong Province. The drainage area above Shijiao hydrological station is about 38,670 km 2 . The terrain of the BJR is mainly mountainous and hilly with the surface elevation of −1 to 1845 m and the terrain slopes of 0-72.79 • ; the mean slope is 13.37 in this basin. The BJR is a typical subtropical monsoon humid area in southern China, with an average annual rainfall of 1844 mm, of which about 80% of the precipitation is concentrated in the flood season. The historical flood disasters are more serious and directly threaten the Pearl River Delta in the lower reaches of the basin.

Data
The daily observed precipitation data from meteorological stations ( Figure 1 and Table 1) in the BJR were obtained from the China Meteorological Data Network (http://data.cma.cn/). The time span of the dataset is 1 January 1960-31 December 2015. The missing values and outliers in the dataset were verified and disposed using the inverse distance weighted method. The detailed information of the stations is displayed in Table 1, and the annual precipitation over the basin is showed in Figure 2. The GEFS and CFSv2 used in this study were both provided by NCEP. The GEFS is a weather forecast model using the breeding scheme [33] to generate ensemble perturbations accounting for the uncertainty of initial conditions for medium-range ensemble prediction [11]. The model consists of 11 ensemble members with daily forecast from December 1984 to the present, beginning with 00 UTC (Coordinated Universal Time) every day, and uses an Eulerian horizontal resolution of T254 for the first 8 days of the forecast and T190 for the second 8 days with the data format of Grib2. The dataset contains multiple weather variables such as tropical cyclone, temperature, precipitation, wind etc., of which precipitation is the focus of this paper.
The CFSv2 is a fully coupled atmosphere-ocean-land model developed from four independently designed pieces of technology [30]. It uses the latest scientific methods to collect or assimilate a variety of data sources including ground observations, high-altitude balloon observations, aircraft observations, and satellite observations. The model has updated forecast data of multiple variables (e.g., Madden-Julian oscillation, sea surface temperature, 2-m air temperature, and precipitation) around the world from 1982 to March 2011, with 9-month hindcasts of every 5 days and 4 forecast cycles of that day, i.e., 00, 06, 12, and 18 UTC, and the same hindcasts of daily data from April 2011 to present, at a spatial resolution of 0.938° × 0.938°.

Ensemble Pre-Processor
Tremendous progress has been made in the numerical weather and climate models mentioned above over the past decades, but the forecasts generated by those models still contain many deviations and uncertainties. EPP is a useful method that post-processes the ensemble forecasts from these models, reduces the deviations and uncertainties, and prepares precipitation ensemble forecasts for hydrological models. This method transforms the time series of single-value quantitative precipitation forecasts into corresponding ensemble forecasts of precipitation.
(1) Time window As both the observed and forecasted precipitation would have values of 0, the regression analysis cannot be performed when single-day is selected in this case. The solution is to calculate a proper time window, extract all the data in the time window of given day of each year for statistical regression analysis, to increase the significance of statistical relationships. The time window selection range of this study is 15-30 days before and after given day, which means the total number range of samples in the time window is 31-61. The prerequisite for determining the time window of each given day is to ensure that enough samples can be taken for analysis.
(2) Precipitation threshold It is considered that precipitation has not occurred if the precipitation value is less than the precipitation threshold. The method of determining the precipitation threshold is to arrange the The GEFS and CFSv2 used in this study were both provided by NCEP. The GEFS is a weather forecast model using the breeding scheme [33] to generate ensemble perturbations accounting for the uncertainty of initial conditions for medium-range ensemble prediction [11]. The model consists of 11 ensemble members with daily forecast from December 1984 to the present, beginning with 00 UTC (Coordinated Universal Time) every day, and uses an Eulerian horizontal resolution of T254 for the first 8 days of the forecast and T190 for the second 8 days with the data format of Grib2. The dataset contains multiple weather variables such as tropical cyclone, temperature, precipitation, wind etc., of which precipitation is the focus of this paper.
The CFSv2 is a fully coupled atmosphere-ocean-land model developed from four independently designed pieces of technology [30]. It uses the latest scientific methods to collect or assimilate a variety of data sources including ground observations, high-altitude balloon observations, aircraft observations, and satellite observations. The model has updated forecast data of multiple variables (e.g., Madden-Julian oscillation, sea surface temperature, 2-m air temperature, and precipitation) around the world from 1982 to March 2011, with 9-month hindcasts of every 5 days and 4 forecast cycles of that day, i.e., 00, 06, 12, and 18 UTC, and the same hindcasts of daily data from April 2011 to present, at a spatial resolution of 0.938 • × 0.938 • .

Ensemble Pre-Processor
Tremendous progress has been made in the numerical weather and climate models mentioned above over the past decades, but the forecasts generated by those models still contain many deviations and uncertainties. EPP is a useful method that post-processes the ensemble forecasts from these models, reduces the deviations and uncertainties, and prepares precipitation ensemble forecasts for hydrological models. This method transforms the time series of single-value quantitative precipitation forecasts into corresponding ensemble forecasts of precipitation.
(1) Time window As both the observed and forecasted precipitation would have values of 0, the regression analysis cannot be performed when single-day is selected in this case. The solution is to calculate a proper time window, extract all the data in the time window of given day of each year for statistical regression analysis, to increase the significance of statistical relationships. The time window selection range of this study is 15-30 days before and after given day, which means the total number range of samples in the time window is 31-61. The prerequisite for determining the time window of each given day is to ensure that enough samples can be taken for analysis.
(2) Precipitation threshold It is considered that precipitation has not occurred if the precipitation value is less than the precipitation threshold. The method of determining the precipitation threshold is to arrange the single-value precipitation series of ensemble forecast in descending order. If the precipitation sequence X = {x 1 , x 2 , . . . , x n }, the precipitation threshold x m can be determined by the following formula.
As long as the percentage p value (e.g., 0.97) is determined, the precipitation threshold x m can be calculated.
(3) Canonical events The purpose of canonical events is to extract useful information and eliminate forecast errors to establish a statistical relationship between historical observations and forecasts, so as to correct deviation of future forecast precipitation and reduce uncertainty. The premise of establishing canonical events model is to find events with high correlation from historical observed and forecasted data. According to the theory of chaos in the atmospheric system proposed by Lorenz [34], any forecast of meteorological elements over 5 days is not reliable, but this does not mean that is completely useless [32]. Specifically, GEFS and CFSv2 contain forecast data of multiple lead times; the forecast over the lead time of 5 days is not necessarily reliable, but the accumulated precipitation still contains useful information. For example, the single-day precipitation forecasts from the 6-15-day lead times are not suitable for canonical events since it has a weak correlation with the observed precipitation at the corresponding time. However, the cumulative precipitation forecasts of the 6-15-day lead times can be regarded as a canonical event if it has a good correlation with the observed data. The construction of the joint probability distribution of the measured and predicted meteorological element is based on the canonical events.
(4) Conditional distribution The construction process of the conditional distribution of canonical events includes designing canonical event, calculating the marginal distribution of measured and forecasted precipitation, converting the marginal distribution into a standard normal distribution, and forming the conditional distribution of observations for a given precipitation forecast.
Assuming that X represents the forecast series, Y represents the observation series, the time window has been calculated to be w, and the time series length of the research data is n years. Then, the canonical events can be displayed as X = (x) n×w and Y = (y) n×w , and their marginal distribution can be expressed as . p x and p y represent the probability of forecasted and observed precipitation respectively. F XC (x|x > 0) and F YC y y > 0 represent the cumulative probability distribution of forecasted and observed precipitation when it rained, respectively. F(x) and F(y) represent the cumulative probability distribution of forecasted and observed precipitation. The cumulative probability distribution can be fitted by Gamma, Weibull, or exponential distribution. The Gamma distribution is used in this paper.
It is difficult to find the joint distribution of forecasted and observed precipitation since they are non-normally distributed. The EPP converts the marginal distribution into the standard normal by the NQT to reach the purpose. The conversion process can refer to Ye et al. [35].
The correspondence can be expressed as follows According to the sampling results of the random variables U and V in the normal distribution space, through the inverse function of the NQT, the conditional probability distribution of the measured precipitation based on the original forecast and the members of the ensemble forecast of corresponding time points can be obtained.

Schaake Shuffle
Because of the randomness of sampling process, the ensemble members of precipitation do not have certain time continuity. Schaake shuffle (SS) method reconstructs the time relationship of the obtained ensemble members.
SS is mainly divided into three steps. First, it arranges the sequence of observations in ascending order at each time point, then arranges the ensemble members of the same time in ascending order, and finally rearranges the ensemble members according to the rank of observations to form a new ensemble forecast with continuous time information. The principle of SS can refer to Ye et al. [35]. The example of the calculation process is shown in Appendix A.

Model Performance Measures
(1) Correlation coefficient Correlation coefficient (R) is the most commonly used indicator in the field of statistics. In this paper, it is used to measure the linear relationship between the observed and the forecasted precipitation which is represented by the mean of ensemble members. The correlation coefficient is calculated by the following formula where Cov(x, y) refers to the sample covariance. Std(x) and Std(y) refer to the standard error of observed and the forecasted precipitation, respectively.
(2) Root mean square error The root mean square error (RMSE) is the square root of the mean square error (MSE) and can be calculated as follow where n refers to the number of samples. x i and y i refer to the observed and the forecasted precipitation, respectively.
(3) Nash efficiency coefficient The Nash efficiency coefficient (NSE) is generally used to evaluate whether the two sequences are consistent, and is used in this study to evaluate the accuracy of ensemble forecast (4) Rank histogram Rank histogram (RH) is to evaluate the reliability of the probability distribution of the ensemble forecast. The observations should fall evenly between any two ensemble members if the ensemble forecast is reliable. The theory and calculation steps of rank histogram can refer to Broecker et al. [36,37].
(5) Continuous ranked probability skill score Continuous ranked probability skill score (CRPSS) can be used to measure the continuous prediction probability score (CRPS) of a forecast system to another system. The calculation process is as follows [38,39] where P f (x) and P o (x) refer to the probability of simulated and observed cumulative distribution, respectively. CRPS re f represents CRPS of the selected reference system.

Evaluation of the GEFS and CFSv2
The ensemble means of the GEFS and CFSv2 forecasts are evaluated over the stations of the BJR based on observed precipitation. Taking the Shaoguan station as an example, Figure 3 displays the correlation coefficients of the observation and raw GEFS and CFSv2, respectively. The red color indicates high correlation, whereas the blue indicates low correlation. As shown in the figure, the results of GEFS are larger than those of CFSv2, which means the accuracy of GEFS is higher, and both accuracies are decreasing with the lead time. Besides, it can be found that both GEFS and CFSv2 have higher correlations in autumn and winter than in spring and summer, especially for the CFSv2, which performed nearly no prediction skills in the beginning of spring (Figure 3b). In addition, notable accuracy of GEFS forecasts can be recognized for the first few days (Figure 3a), and it can last about one week in the winter. A similar characteristic can also be detected in the results of CFSv2 but not remarkable.
Water 2020, 12, x FOR PEER REVIEW 8 of 18

Evaluation of the GEFS and CFSv2
The ensemble means of the GEFS and CFSv2 forecasts are evaluated over the stations of the BJR based on observed precipitation. Taking the Shaoguan station as an example, Figure 3 displays the correlation coefficients of the observation and raw GEFS and CFSv2, respectively. The red color indicates high correlation, whereas the blue indicates low correlation. As shown in the figure, the results of GEFS are larger than those of CFSv2, which means the accuracy of GEFS is higher, and both accuracies are decreasing with the lead time. Besides, it can be found that both GEFS and CFSv2 have higher correlations in autumn and winter than in spring and summer, especially for the CFSv2, which performed nearly no prediction skills in the beginning of spring (Figure 3b). In addition, notable accuracy of GEFS forecasts can be recognized for the first few days (Figure 3a), and it can last about one week in the winter. A similar characteristic can also be detected in the results of CFSv2 but not remarkable.

Verification of the Processed Ensemble Forecasts
According to the analysis of raw GEFS and CFSv2, this study constructed four schemes to improve the accuracy by EPP. The designed schemes are displayed in Table 2

Verification of the Processed Ensemble Forecasts
According to the analysis of raw GEFS and CFSv2, this study constructed four schemes to improve the accuracy by EPP. The designed schemes are displayed in Table 2            4-7 illustrate the evaluation results of Schemes 1-4, respectively, from which we can conclude that the schemes (Schemes 2-4) of cumulative forecast precipitation perform significantly better than the single-time step forecast, and the forecast precipitation of post-processed have higher accuracy compared to the raw forecasts. Moreover, the schemes (Schemes 3 and 4) considering observed information perform better than those that do not. The canonical events of composite time step designed for GEFS are effective to improve the forecast accuracy at a certain extent, whereas the events for CFSv2 are not performed ideally. The results of Scheme 1 show that CFSv2 is better than GEFS to some extent for single-time step forecast; however, it has little change under other schemes. This feature means that CFSv2 is concerned with climate information for a longer lead time and not very sensitive to the canonical events.
RH is used in this study to evaluate the reliability of ensemble distribution. The perfect forecast appears as the observation is evenly distributed in each level of the ensemble members. The RHs of Scheme 1 are not shown in this study since Figure 4 indicates that it cannot improve the forecast accuracy significantly. As mentioned above, Figures 8-10 display the RHs of Shaoguan station under Schemes 2-4, respectively. The figures indicate that the schemes are designed properly to reduce error of raw GEFS and CFSv2. In comparison, Schemes 3 and 4 are better than Scheme 2 for GEFS, but they are not as good for CFSv2. The RHs of CFSv2 in Scheme 3 are right skewed U-shaped, which indicates that the range of ensemble members is narrower and most members are larger than the measured precipitation after post-processing. Scheme 4 shows a larger U-shaped differentiation, indicating that the post-processing results are further narrowed compared with Scheme 3.
Water 2020, 12, x FOR PEER REVIEW 11 of 18 considering observed information perform better than those that do not. The canonical events of composite time step designed for GEFS are effective to improve the forecast accuracy at a certain extent, whereas the events for CFSv2 are not performed ideally. The results of Scheme 1 show that CFSv2 is better than GEFS to some extent for single-time step forecast; however, it has little change under other schemes. This feature means that CFSv2 is concerned with climate information for a longer lead time and not very sensitive to the canonical events. RH is used in this study to evaluate the reliability of ensemble distribution. The perfect forecast appears as the observation is evenly distributed in each level of the ensemble members. The RHs of Scheme 1 are not shown in this study since Figure 4 indicates that it cannot improve the forecast accuracy significantly. As mentioned above, Figures 8-10 display the RHs of Shaoguan station under Schemes 2-4, respectively. The figures indicate that the schemes are designed properly to reduce error of raw GEFS and CFSv2. In comparison, Schemes 3 and 4 are better than Scheme 2 for GEFS, but they are not as good for CFSv2. The RHs of CFSv2 in Scheme 3 are right skewed U-shaped, which indicates that the range of ensemble members is narrower and most members are larger than the measured precipitation after post-processing. Scheme 4 shows a larger U-shaped differentiation, indicating that the post-processing results are further narrowed compared with Scheme 3.   Water 2020, 12, x FOR PEER REVIEW 11 of 18 considering observed information perform better than those that do not. The canonical events of composite time step designed for GEFS are effective to improve the forecast accuracy at a certain extent, whereas the events for CFSv2 are not performed ideally. The results of Scheme 1 show that CFSv2 is better than GEFS to some extent for single-time step forecast; however, it has little change under other schemes. This feature means that CFSv2 is concerned with climate information for a longer lead time and not very sensitive to the canonical events. RH is used in this study to evaluate the reliability of ensemble distribution. The perfect forecast appears as the observation is evenly distributed in each level of the ensemble members. The RHs of Scheme 1 are not shown in this study since Figure 4 indicates that it cannot improve the forecast accuracy significantly. As mentioned above, Figures 8-10 display the RHs of Shaoguan station under Schemes 2-4, respectively. The figures indicate that the schemes are designed properly to reduce error of raw GEFS and CFSv2. In comparison, Schemes 3 and 4 are better than Scheme 2 for GEFS, but they are not as good for CFSv2. The RHs of CFSv2 in Scheme 3 are right skewed U-shaped, which indicates that the range of ensemble members is narrower and most members are larger than the measured precipitation after post-processing. Scheme 4 shows a larger U-shaped differentiation, indicating that the post-processing results are further narrowed compared with Scheme 3.    This study calculated the CRPSS of raw and post-processed ensemble forecast precipitation, respectively, to make a comparison. Only the results of Schemes 2 and 3 for Shaoguan station are shown in Figures 11 and 12 considering the results of RH. The figures indicate that the raw forecasts have a lower CRPSS and differences exist in different stations. In addition, Scheme 2 shows that raw GEFS has higher CRPSS in cool seasons, and Fogang, which is lower latitude, has higher CRPSS than Shaoguan and Nanxiong in other seasons. This may be related to site location and climatic conditions; the annual precipitation of Fogang is larger than other stations. The CRPSS of CFSv2 indicates that it has almost no forecasting skills.
After post-processing, the CRPSS of GEFS and CFSv2 have become significantly larger, which means the EPP based on Schemes 2 and 3 can reduce the errors of ensemble forecast and improve the forecast accuracy. However, there are still many CRPSS of Event 1 in Scheme 2 which are lower than the normal value of other events, indicating that the precipitation samples of Event 1 are still not enough. It is necessary to take measures to increase precipitation samples for post-processing analysis. This problem does not exist in Scheme 3. On the whole, Scheme 3 would be the best one to improve the accuracy of ensemble forecast by EPP. This study calculated the CRPSS of raw and post-processed ensemble forecast precipitation, respectively, to make a comparison. Only the results of Schemes 2 and 3 for Shaoguan station are shown in Figures 11 and 12 considering the results of RH. The figures indicate that the raw forecasts have a lower CRPSS and differences exist in different stations. In addition, Scheme 2 shows that raw GEFS has higher CRPSS in cool seasons, and Fogang, which is lower latitude, has higher CRPSS than Shaoguan and Nanxiong in other seasons. This may be related to site location and climatic conditions; the annual precipitation of Fogang is larger than other stations. The CRPSS of CFSv2 indicates that it has almost no forecasting skills. This study calculated the CRPSS of raw and post-processed ensemble forecast precipitation, respectively, to make a comparison. Only the results of Schemes 2 and 3 for Shaoguan station are shown in Figures 11 and 12 considering the results of RH. The figures indicate that the raw forecasts have a lower CRPSS and differences exist in different stations. In addition, Scheme 2 shows that raw GEFS has higher CRPSS in cool seasons, and Fogang, which is lower latitude, has higher CRPSS than Shaoguan and Nanxiong in other seasons. This may be related to site location and climatic conditions; the annual precipitation of Fogang is larger than other stations. The CRPSS of CFSv2 indicates that it has almost no forecasting skills.
After post-processing, the CRPSS of GEFS and CFSv2 have become significantly larger, which means the EPP based on Schemes 2 and 3 can reduce the errors of ensemble forecast and improve the forecast accuracy. However, there are still many CRPSS of Event 1 in Scheme 2 which are lower than the normal value of other events, indicating that the precipitation samples of Event 1 are still not enough. It is necessary to take measures to increase precipitation samples for post-processing analysis. This problem does not exist in Scheme 3. On the whole, Scheme 3 would be the best one to improve the accuracy of ensemble forecast by EPP.   Figure 13 displays the annual average monthly precipitation of observed precipitation, raw GEFS forecast, and post-processed GEFS forecast, which are represented by black, blue, and red lines, respectively. We can conclude from the results in the figures that the raw forecast is more or less biased, but the new ensemble forecast results are close to the observed values after ensemble post-processing. In some areas where the raw ensemble forecast deviation is large, the optimization scheme in this paper can also greatly improve it. After post-processing, the CRPSS of GEFS and CFSv2 have become significantly larger, which means the EPP based on Schemes 2 and 3 can reduce the errors of ensemble forecast and improve the forecast accuracy. However, there are still many CRPSS of Event 1 in Scheme 2 which are lower than the normal value of other events, indicating that the precipitation samples of Event 1 are still not enough. It is necessary to take measures to increase precipitation samples for post-processing analysis. This problem does not exist in Scheme 3. On the whole, Scheme 3 would be the best one to improve the accuracy of ensemble forecast by EPP. Figure 13 displays the annual average monthly precipitation of observed precipitation, raw GEFS forecast, and post-processed GEFS forecast, which are represented by black, blue, and red lines, respectively. We can conclude from the results in the figures that the raw forecast is more or less biased, but the new ensemble forecast results are close to the observed values after ensemble post-processing. In some areas where the raw ensemble forecast deviation is large, the optimization scheme in this paper can also greatly improve it.

Evaluation of Integrated Forecast Precipitation
According to the above research, Scheme 3 has the best designed canonical events to reduce the deviation of ensemble forecast. In this section, the GEFS and CFSv2 are integrated into a new forecast series named CFSGEFS using EPP with the canonical events of Scheme 3 to test the performance of EPP in longer lead time, in which GEFS is selected for the former 15 days and CFSv2 for Days 16-30. We calculated the post-processed forecast precipitation of nine stations and turn them into surface rainfall of the BJR by the Thiessen polygons method to make comparison. The results are displayed in Figure 14, where CFS and CFSGEFS represent the evaluation results of post-processed CFSv2 and CFSGEFS in different time scales; Bi-Week 1 and Bi-Week 2 represent the former and latter two weeks of one month, respectively.

Evaluation of Integrated Forecast Precipitation
According to the above research, Scheme 3 has the best designed canonical events to reduce the deviation of ensemble forecast. In this section, the GEFS and CFSv2 are integrated into a new forecast series named CFSGEFS using EPP with the canonical events of Scheme 3 to test the performance of EPP in longer lead time, in which GEFS is selected for the former 15 days and CFSv2 for Days 16-30. We calculated the post-processed forecast precipitation of nine stations and turn them into surface rainfall of the BJR by the Thiessen polygons method to make comparison. The results are displayed in Figure 14, where CFS and CFSGEFS represent the evaluation results of post-processed CFSv2 and CFSGEFS in different time scales; Bi-Week 1 and Bi-Week 2 represent the former and latter two weeks of one month, respectively. ( a)

Evaluation of Integrated Forecast Precipitation
According to the above research, Scheme 3 has the best designed canonical events to reduce the deviation of ensemble forecast. In this section, the GEFS and CFSv2 are integrated into a new forecast series named CFSGEFS using EPP with the canonical events of Scheme 3 to test the performance of EPP in longer lead time, in which GEFS is selected for the former 15 days and CFSv2 for Days 16-30. We calculated the post-processed forecast precipitation of nine stations and turn them into surface rainfall of the BJR by the Thiessen polygons method to make comparison. The results are displayed in Figure 14, where CFS and CFSGEFS represent the evaluation results of post-processed CFSv2 and CFSGEFS in different time scales; Bi-Week 1 and Bi-Week 2 represent the former and latter two weeks of one month, respectively. It can be concluded from the results in Figure 14 that the integrated forecast (CFSGEFS) performs better in weekly and monthly lead time comparing to the post-processed CFSv2. In addition, the results of Bi-Week 1 perform better than those of Bi-Week 2 since GEFS has higher forecast accuracy than CFSv2 in former days. This result indicates that the EPP can improve the forecast accuracy of CFSv2 in longer lead time significantly, but the integrated forecast precipitation improved by EPP would have better forecast accuracy and skills. The post-processed CFSGEFS precipitation can be used to improve the accuracy and reduce the uncertainty of hydrological prediction in long-term time scale.

Conclusions
Based on daily observed precipitation from meteorological stations of the BJR and GEFS and CFSv2 from NCEP, this study designed multiple schemes to correct deviations of ensemble forecast using EPP method and evaluated the accuracy of the results from multiple aspects. In addition, the best scheme was selected to integrate GEFS and CFSv2 to evaluate the forecast accuracy of long-term lead time. The processed results were evaluated by R, RMSE, NSE, RH, and CRPSS after many calculations, in which the former three indices were used to evaluate the ensemble mean and the latter indices were used to evaluate the ensemble distribution. We can conclude from the results that the raw forecasts of GEFS and CFSv2 contain more or less deviations and the EPP based on canonical events can improve their accuracy and give the deterministic forecast. The performance of results largely depends on the design of canonical events. In this study, Scheme 3 has been proven to be the optimal solution to improve forecast accuracy and reduce the uncertainty. However, the CFSv2 is not very sensitive to different designed schemes as it is concerned with climate information for a longer lead time. This study integrated the GEFS and CFSv2 by canonical events to try to improve the accuracy of long-time scale forecasting (e.g., weekly and monthly) considering the features of post-processed results. We found that the post-processing of integrated data can achieve better forecast results in longer time scales than that of CFSv2.
In summary, the EPP used in this study can improve the accuracy and reduce the uncertainty of ensemble forecasts significantly, and it can also be a helpful solution for long-term forecast of precipitation. These results would be very important for meteorological and hydrological ensemble prediction since they can be applied to many areas based on the global data and the precipitation forecast with higher precision can increase the accuracy and reduce the uncertainty of hydrological forecast, which plays a decisive role in preventing flood disasters. However, as the progress by EPP highly depends on the design of canonical events, the performance may vary in different regions. Therefore, the events should be redesigned when applied in other areas. In addition, the EPP can also be used to process the ensemble forecast of other elements from GEFS and CFSv2, which would be the following research and accompanied with a driving hydrological model in different time scales to evaluate the improving skills for hydrological forecasting.   The observed and forecast data are randomly sampled between 1 and 50.