An Improved Weighting Method of Time-Lag-Ensemble Averaging for Hourly Precipitation Forecasts and Its Application in a Typhoon-Induced Heavy Rainfall Event

: Heavy rainfall events often cause great societal and economic impacts. The prediction ability of traditional extrapolation techniques decreases rapidly with the increase in the lead time. Moreover, deﬁciencies of high-resolution numerical models and high-frequency data assimilation will increase the prediction uncertainty. To address these shortcomings, based on the hourly precipitation prediction of Global/Regional Assimilation and Prediction System-Cycle of Hourly Assimilation and Forecast (GRAPES-CHAF) and Shanghai Meteorological Service-WRF ADAS Rapid Refresh System (SMS-WARR), we present an improved weighting method of time-lag-ensemble averaging for hourly precipitation forecast which gives more weight to heavy rainfall and can quickly select the optimal ensemble members for forecasting. In addition, by using the cross-magnitude weight (CMW) method, mean absolute error (MAE), root mean square error (RMSE) and correlation coefﬁcient (CC), the veriﬁcation results of hourly precipitation forecast for next six hours in Hunan Province during the 2019 typhoon Bailu case and heavy rainfall events from April to September in 2020 show that the revised forecast method can more accurately capture the characteristics of the hourly short-range precipitation forecast and improve the forecast accuracy and the probability of detection of heavy rainfall.


Introduction
Short, extreme rainfall events are a frequent occurrence in China, often with great societal and economic impacts. However, due to their small scale, short life history, abrupt onset and dissipation, and complex moving path, forecasts and early warning have always been difficult [1,2]. In previous studies, the short-range forecast was mainly based on radar echo, satellite cloud image and other data for extrapolation. The mature methods used in China include extrapolation techniques and empirical prediction methods, such as traditional and improved optical flow methods and ingredients-based methodology [3][4][5]. It has been revealed that due to the lack of the physical mechanism of the evolution of the severe convective system, the prediction ability of this method decreases rapidly with the increase in the forecast lead times, and the available time limit of extrapolation is less than one hour [6]. With the continuous development of numerical weather precipitation techniques and the improvement of computer performance, especially the method of combining extrapolation prediction with high-resolution numerical prediction, the prediction skills of mesoscale models for precipitation have been continuously improved [7,8].
Compared with deterministic model prediction, ensemble prediction can provide uncertain prediction information by using multiple initial values or models to estimate the

Brief Introductions of the Two Models
Global/Regional Assimilation and Prediction System-Cycle of Hourly Assimilation and Forecast (GRAPES-CHAF) are developed on the basis of the GRAPES-meso model. The IFS analysis field provided by ECMWF model is used for the initial boundary data of GRAPES-CHAF. The initial guess field is obtained from the 3-h ECMWF forecast at 1200 UTC every day. The initial guess field at other times is from the 1-h GRAPES-CHAF forecast. The initial field is obtained by assimilating the observation data through GRAPES-3DVAR. The number of horizontal grid points is 634 × 434, and there are 55 layers in the vertical direction.
Shanghai Meteorological Service-WRF ADAS Rapid Refresh System (SMS-WARR) is developed on the basis of the WRF model, and the boundary field is derived from the GFS forecast field at 1200 UTC every day. The initial guess field is from the GFS 6-h forecast at 1200 UTC every day. The initial guess field at other times is from the SMB-WARR 1-h forecast. The initial field is obtained by assimilating the observation data through ADAS. The number of horizontal grid points is 525 × 625, and there are 51 layers in the vertical direction.
These two models are mainly aimed at the short-range weather forecast, with 1 h and 24 h lead time, respectively. Radar, satellite, aircraft, ground and sounding observation are assimilated hourly. The horizontal resolution is 3 km. Since the resolution of 3 km can explicitly distinguish cloud physical processes, no convection parameterization scheme was used. Table 1 shows the physical parameterization schemes used in the two models.

Study Area
The area we studied is Hunan Province, located in the middle reaches of the Yangtze River, surrounded by mountains and hills in the east, south, and west ( Figure 1). It is an asymmetric horseshoe-shaped basin, high in the West and low in the East, high in the South and low in the north. Due to the obvious influence of terrain on rainfall, especially in mountainous regions [32][33][34], the rainstorm occurred frequently from April to September in Hunan Province.

Time-Lag-Ensemble Forecasting Method
Although the models can distinguish the mesoscale and mesoscale convective activities, they have some shortcomings. The time-lag-ensemble forecasting method integrates the model forecast results of different forecast reference times and forecast lead times for the same time, selects different weights to obtain the revised hourly precipitation forecast, which can effectively improve the forecast error caused by the "spin-up" of the model and the uncertainty of the initial field. Based on the hourly precipitation products of GRAPES-CHAF and SMS-WARR within 24 h, the ensemble members of the two models are formed by the time-lag-ensemble method, respectively. Figure 2 shows the composition of the ensemble members at the current time (H). The formula for calculating the number of ensemble members (N) of the forecast field is as follows:

Time-Lag-Ensemble Forecasting Method
Although the models can distinguish the mesoscale and mesoscale convective activities, they have some shortcomings. The time-lag-ensemble forecasting method integrates the model forecast results of different forecast reference times and forecast lead times for the same time, selects different weights to obtain the revised hourly precipitation forecast, which can effectively improve the forecast error caused by the "spin-up" of the model and the uncertainty of the initial field. Based on the hourly precipitation products of GRAPES-CHAF and SMS-WARR within 24 h, the ensemble members of the two models are formed by the time-lag-ensemble method, respectively. Figure 2 shows the composition of the ensemble members at the current time (H). The formula for calculating the number of ensemble members (N) of the forecast field is as follows: where N is the number of ensemble members, INT is the rounding operator, M is the longest forecast lead time for optional model prediction, L is the forecast lead time of the prediction time for the current time H, dt is the prediction time interval of the model, i is the time lag caused by computation and data transmission, and n is the number of models. For the two models, the longest forecast lead time is 24 h, the forecast lead time of t 0 is the next six hours of the current time H, and the prediction time interval (dt) of the model is one hour. Nearly two hours have passed since the mode products were transmitted to Hunan, so the time lag is about two hours. There are 16 ensemble members that can forecast the precipitation in the next six hours for each model through calculation. Based on GRAPES-CHAF and SMS-WARR, the initial fields at different times at intervals of 1h are used for prediction, and then the ensemble members are constructed from the prediction results at the same time ( Figure 2). For example, the ensemble members include −03Z, −04Z, −05Z, . . . , −18Z of two patterns, with a total of 32 members, which are denoted as M1, M2, M3... M32, respectively.
diction time for the current time H, dt is the prediction time interval of the model, i is the time lag caused by computation and data transmission, and n is the number of models.
For the two models, the longest forecast lead time is 24 h, the forecast lead time of t0 is the next six hours of the current time H, and the prediction time interval (dt) of the model is one hour. Nearly two hours have passed since the mode products were transmitted to Hunan, so the time lag is about two hours. There are 16 ensemble members that can forecast the precipitation in the next six hours for each model through calculation. Based on GRAPES-CHAF and SMS-WARR, the initial fields at different times at intervals of 1h are used for prediction, and then the ensemble members are constructed from the prediction results at the same time ( Figure 2). For example, the ensemble members include −03Z, −04Z, −05Z, …, −18Z of two patterns, with a total of 32 members, which are denoted as M1, M2, M3... M32, respectively. By comparing the distribution of the frequency of highest precipitation forecast scores of ensemble members in GRAPES-CHAF and SMS-WARR with 32 different ensemble members in the next six hours (Figure 3), it can be found that the highest score of timelagged ensemble members occur more frequently when they are close to or far away from the forecast reference time, and few high scores occur in the middle of forecast reference times in GRAPES-CHAF. In SMS-WARR model, the highest scores of ensemble members appear more frequently when they are close to the forecast reference time, and little difference among other ensemble members. Through statistics, it can be found that although some members of the two models obtain the highest score when they are close to the forecast reference time, the probability is less than 20%. Due to the continuous adjustment and updating of the model, it costs lots of time to select the best forecast member manually. Through the time-lag-ensemble forecasting method, the revised precipitation forecast products can be directly provided, and the forecast efficiency can be greatly improved. By comparing the distribution of the frequency of highest precipitation forecast scores of ensemble members in GRAPES-CHAF and SMS-WARR with 32 different ensemble members in the next six hours (Figure 3), it can be found that the highest score of time-lagged ensemble members occur more frequently when they are close to or far away from the forecast reference time, and few high scores occur in the middle of forecast reference times in GRAPES-CHAF. In SMS-WARR model, the highest scores of ensemble members appear more frequently when they are close to the forecast reference time, and little difference among other ensemble members. Through statistics, it can be found that although some members of the two models obtain the highest score when they are close to the forecast reference time, the probability is less than 20%. Due to the continuous adjustment and updating of the model, it costs lots of time to select the best forecast member manually. Through the time-lag-ensemble forecasting method, the revised precipitation forecast products can be directly provided, and the forecast efficiency can be greatly improved.

Optimal Integration Method of Cross-Magnitude Weight
How to correctly verify the prediction capacity of each member, so as to select the members with high forecast scores, has become an important factor affecting the effect of the final ensemble revision forecast. In this study, on the basis of objective classification of hourly precipitation intensity, we allocated different weights to precipitation with different magnitudes and establish a new verification method suitable for hourly precipitation, that is, a cross-magnitude weight method (Table 3). Table 3. Cross-magnitude weight method *.

Optimal Integration Method of Cross-Magnitude Weight
How to correctly verify the prediction capacity of each member, so as to select the members with high forecast scores, has become an important factor affecting the effect of the final ensemble revision forecast. In this study, on the basis of objective classification of hourly precipitation intensity, we allocated different weights to precipitation with different magnitudes and establish a new verification method suitable for hourly precipitation, that is, a cross-magnitude weight method (Table 3).
Rainstorm and extraordinary rainstorm 20 ≤ R −0.8 0 0.5 1.5 4 (G−R ≤ 10) * G is the observed precipitation, R is the predicted precipitation, and the range value in brackets is the scoring condition when the observed and predicted precipitation are of adjacent magnitude. When the difference between the observed and predicted precipitation is within this range, they are regarded as the same magnitude.
A high score indicates an accurate prediction. The weight of different grades of precipitation is different. The highest scores of light rain, moderate rain, heavy rain and rainstorm are 1, 1.5, 2 and 4, respectively.
When there is no precipitation observed but the model predicts precipitation, the score is calculated by the score reduction method, and the greater the deviation, the greater the score reduction. For example, if there is no precipitation observed and the forecast is light rain, moderate rain, heavy rain or rainstorm, the scores will be reduced by 0.2, 0.3, 0.5 and 0.8 points, respectively. When there is no precipitation in observation and forecast, there is no score.
The cross-magnitude weight method sets up a calculation method according to the scoring when the observed and the predicted precipitation are of adjacent magnitude. When the difference between them is small, it is considered that they are of the same magnitude, and the highest score is obtained according to the observed precipitation magnitude, instead of zero by the traditional scoring method.
Moreover, prediction underestimation or overestimation can still be scored, but the score should be lower than the highest score of the right grade. For example, the observation is rainstorm, if the forecast is heavy rain, the score is 1.5, which is lower than the highest score of heavy rain grade (2 points); if the forecast is moderate rain, the score is 0.5, which is lower than the highest score of moderate rain grade (1.5 points); if the forecast is light rain, the score is 0, which is lower than the highest score of light rain grade (1 point).
Based on the hourly precipitation data of Hunan automatic weather stations, the hourly precipitation of ensemble members with 16 different forecast reference times in the past three hours in each model is dynamically evaluated by using the cross-magnitude weight method. The forecast of the ensemble members with the top three scores is selected to revise the hourly forecast in the next six hours. The process is shown schematically in Figure 4. lower than the highest score of moderate rain grade (1.5 points); if the forecast is light rain, the score is 0, which is lower than the highest score of light rain grade (1 point).
Based on the hourly precipitation data of Hunan automatic weather stations, the hourly precipitation of ensemble members with 16 different forecast reference times in the past three hours in each model is dynamically evaluated by using the cross-magnitude weight method. The forecast of the ensemble members with the top three scores is selected to revise the hourly forecast in the next six hours. The process is shown schematically in Figure 4.

Results
In this paper, four metrics are designed to evaluate the accuracy of hourly precipitation forecasting in Hunan in the next six hours by using the CMW method, mean absolute error (MAE), root mean square error (RMSE) and correlation coefficient (CC) statistical methods. Table 4 lists the specific calculation methods of the plans. Table 4. Four calculation plans of forecast products.

Programme
Method Plan 1 Average forecast of ensemble members with the top three scores Plan 2 Forecast of ensemble member with the highest score Plan 3 Forecast at the nearest forecast reference time of GRAPES-CHAF Plan 4 Forecast at the nearest forecast reference time of SMS-WARR

Results
In this paper, four metrics are designed to evaluate the accuracy of hourly precipitation forecasting in Hunan in the next six hours by using the CMW method, mean absolute error (MAE), root mean square error (RMSE) and correlation coefficient (CC) statistical methods. Table 4 lists the specific calculation methods of the plans. Table 4. Four calculation plans of forecast products.

Programme Method
Plan 1 Average forecast of ensemble members with the top three scores Plan 2 Forecast of ensemble member with the highest score Plan 3 Forecast at the nearest forecast reference time of GRAPES-CHAF Plan 4 Forecast at the nearest forecast reference time of SMS-WARR

Revised Forecast for Typhoon Bailu
Hunan Province is one of the inland provinces which are often directly or indirectly affected by typhoons. The heavy rainfall caused by typhoons often leads to floods and other disasters, causing huge losses to the national economy and personal property. Due to many uncertainties in the development and evolution of heavy rainfall caused by the typhoon, the short-term forecast and nowcasting are very difficult [35,36]. Therefore, it is very important to verify the forecast ability of the revised forecasting method in typhoon rainstorms.
From 25 to 26 August 2019, Typhoon Bailu caused rainstorms in southeast Hunan, and the strongest precipitation occurred in the early morning of 26 August. The 38 h from 08:00 on the 25th to 20:00 on the 26th were selected as the study period. Table 5 shows verification scores of four forecast plans for hourly precipitation in the next six hours during the impact period (08:00 on 25th to 20:00 on 26th) and the strongest precipitation period (20:00 on August 25th to 02:00 on 26th) caused by typhoon in Hunan Province. The CMW score and CC of Plan 1 is the highest, reaching 0.078 and 0.239, respectively. Moreover, the score of MAE and RMSE of Plan 1 are 0.835 and 2.121, respectively, which were the lowest among the four plans. It should be noted that the Probability of Detection (POD) of Plan 1 for short-range precipitation reaches 50.535%, which is higher than that of the other three plans, while the False Alarm Ratio (FAR) is lower than that of the other three plans. It can be seen that the score of the revised forecast is significantly higher than the latest time forecast and single member forecast, especially in the forecast of the strongest precipitation period. In the period of the strongest rainfall, Plan 1 is more accurate for the area of 25 mm and above, while the other schemes are different for the area of heavy rain and above. In particular, for Plan 3 and Plan 4, the areas with rainfall forecast greater than 50 mm are to the east and north, respectively compared with the observation. Moreover, the forecast is obviously smaller in the areas where heavy rain is observed ( Figure 5). Although the FAR of Plan 1 is slightly higher than that of Plan 2, the POD is significantly improved compared to Plan 2, and both POD and FAR are better than Plan 3 and Plan 4. Furthermore, the score of CMW, MAE, RMSE, CC and POD of Plan 1 are 0.081, 0.813, 2.162 and 0.185, respectively, and the four statistical indicators are better than the other three plans (Table 5). It is worth mentioning that the prediction effect of Plan 1 in the strongest precipitation period of the typhoon is better than that in the whole influence period of the typhoon. The revised forecast improves the prediction ability of RUC forecast models for the heavy precipitation process.  By comparing the hourly forecast of the ensemble member with the highest score and the equally weighted average forecast of ensemble members with top three scores, and the forecast at the nearest forecast reference time (Figure 6), it can be seen that in the early, middle and late stages of typhoon precipitation, the score of Plan 1 is the highest. The scores of Plan 2 and Plan 4 are higher than that of Plan 3 in the early stage of typhoon precipitation. In the middle stage of typhoon precipitation, there is little difference among them, but Plan 4 has almost no ability to predict the later stage of typhoon precipitation. By comparing the hourly forecast of the ensemble member with the highest score and the equally weighted average forecast of ensemble members with top three scores, and the forecast at the nearest forecast reference time (Figure 6), it can be seen that in the early, middle and late stages of typhoon precipitation, the score of Plan 1 is the highest. The scores of Plan 2 and Plan 4 are higher than that of Plan 3 in the early stage of typhoon precipitation. In the middle stage of typhoon precipitation, there is little difference among them, but Plan 4 has almost no ability to predict the later stage of typhoon precipitation. By comparing the hourly forecast of the ensemble member with the highest score and the equally weighted average forecast of ensemble members with top three scores, and the forecast at the nearest forecast reference time (Figure 6), it can be seen that in the early, middle and late stages of typhoon precipitation, the score of Plan 1 is the highest. The scores of Plan 2 and Plan 4 are higher than that of Plan 3 in the early stage of typhoon precipitation. In the middle stage of typhoon precipitation, there is little difference among them, but Plan 4 has almost no ability to predict the later stage of typhoon precipitation.

Statistical Verification for the Period from April to September in 2020
In order to know whether the score of precipitation forecast has been improved with this CMW method, we compared verification scores of four forecast plans for hourly precipitation in the next six hours from April to September in 2020 ( Table 6). The results show that the statistical scores of the average precipitation forecast of the top three members (Plan 1) selected by the cross-magnitude weight method are higher than those of the other three plans. The CMW score of Plan 1 ranked first (0.034), followed by Plan 2 and Plan 4 (0.026 and 0.021), and Plan 3 was the lowest (0.014). Moreover, the MAE and RMSE of Plan 3 are 1.057 and 2.823, respectively, much higher than the other three plans. However, the difference between MAE and RMSE of Plan 1, Plan 2 and Plan 4 are very small. Taking Plan 4 as an example, its CMW score is lower than Plan 1, but its MAE and RMSE scores are slightly higher than Plan 1. The reason for this inconsistency is that in the precipitation Figure 6. Verification of hourly precipitation forecast by using CMW method in the next six hours.

Statistical Verification for the Period from April to September in 2020
In order to know whether the score of precipitation forecast has been improved with this CMW method, we compared verification scores of four forecast plans for hourly precipitation in the next six hours from April to September in 2020 ( Table 6). The results show that the statistical scores of the average precipitation forecast of the top three members (Plan 1) selected by the cross-magnitude weight method are higher than those of the other three plans. The CMW score of Plan 1 ranked first (0.034), followed by Plan 2 and Plan 4 (0.026 and 0.021), and Plan 3 was the lowest (0.014). Moreover, the MAE and RMSE of Plan 3 are 1.057 and 2.823, respectively, much higher than the other three plans. However, the difference between MAE and RMSE of Plan 1, Plan 2 and Plan 4 are very small. Taking Plan 4 as an example, its CMW score is lower than Plan 1, but its MAE and RMSE scores are slightly higher than Plan 1. The reason for this inconsistency is that in the precipitation verification of long time series, no precipitation and small precipitation usually account for the majority, and the error between observation and forecast is small, which makes RMSE/MAE value small. For the same reason, Plan 4 has the highest CC score. However, they are not good enough to evaluate heavy rainfall. The CMW score of Plan 1 is higher than that of others because it gives different weights to different magnitudes of precipitation, the greater the precipitation magnitude is, the greater the weight is. By further comparing the POD and FAR of the four plans for short-range heavy rainfall (greater than 20 mm/h), the POD of Plan 1 (32.967%) is obviously higher than that of the other three plans, while there is little difference in FAR among the four plans. Therefore, the reason for the highest CMW score of Plan 1 is that the forecast of heavy precipitation is the best, and the forecast of heavy precipitation is usually the focus of the short-range forecast. By calculating FAR, it can be found that the FAR of the four plans is not much different, but the POD of Plan 1 is much higher than the other three plans.

Conclusions and Discussion
In this study, based on the hourly precipitation forecast of the GRAPES-CHAF and SMS-WARR models, 16 different ensemble members are constructed for each model by using the time-lag-ensemble method. Meanwhile, on the basis of the statistics of hourly rainfall intensity in Hunan Province, considering the accuracy of wet or dry prediction, the score weight of precipitation with different magnitudes and the score weight of the difference between observation and forecast, we present an optimal integration method of cross-magnitude weight for the hourly precipitation forecast in the next six hours. The revised precipitation forecast products can be directly provided, and the forecast efficiency can be greatly improved.
In order to evaluate the accuracy of the new revised method, we designed four forecast plans to verify the hourly forecast precipitation during typhoon Bailu in 2019 and during April to September 2020 by using CMW, MAE, RMSE and CC. The four plans are the average forecast of ensemble members with top three scores, forecast of ensemble member with the highest score, forecast at the nearest forecast reference time of GRAPES-CHAF, forecast at the nearest forecast reference time of SMS-WARR. The result shows that the score of the improved time-lag-ensemble forecasting method is significantly higher than the latest time forecast and single-member forecast, especially for precipitation areas greater than 25 mm. Heavy rainfall forecast is usually the focus of a short-range forecast. As the CMW method gives higher weight to heavy rainfall, the POD is obviously higher than that of the latest time forecast of GRAPES-CHAF and SMS-WARR.
The objective forecast revised method in this paper can quickly select the optimal model members and has a good application prospect in short-range hourly precipitation forecast (especially rainstorm forecast). Moreover, the method is not sensitive to terrain and other underlying surface information and can be applied in other regions. We must acknowledge that there is a significant weakness in this analysis. This study mainly focuses on the correction of precipitation forecast, which largely depends on the prediction ability of the model itself. The forecast revised ability is insufficient when the forecast deviation of the rainfall position of the model is large. Jeong [37] proposed a statistical parameter correction technique to correct the daily distribution of the predicted near-surface temperature and wind speed. In a future study, we will try to revise the hourly precipitation distribution to further improve the POD of short-range heavy precipitation. In addition, more physical quantities and massive data can be utilized in the future to explore the objective revised method of quantitative precipitation forecast based on machine learning.