Improving the Spring Air Temperature Forecast Skills of BCC_CSM1.1 (m) by Spatial Disaggregation and Bias Correction: Importance of Trend Correction

: In this study, an improved method named spatial disaggregation and detrended bias correction (SDDBC) based on spatial disaggregation and bias correction (SDBC) combined with trend correction was proposed. Using data from meteorological stations over China from 1991 to 2020 and the seasonal hindcast data from the Beijing Climate Center Climate System Model (BCC_CSM1.1 (m)), the performances of the model, SDBC, and SDDBC in spring temperature forecasts were evaluated. The results showed that the observed spring temperature exhibits a signiﬁcant increasing trend in most of China, but the warming trend simulated by the model was obviously smaller. SDBC performed poorly in temperature trend correction. With SDDBC, the model’s deviation in temperature trend was corrected, and consequently, the temporal correlation between the model’s simulation and the observation as well as the forecasting skill on the phase of temperature were improved, thus improving the MSSS and the ACC. From the perspective of probabilistic prediction, the relative operating characteristic skill score (ROCSS) and the Brier skill score (BSS) of the SDDBC for three categorical forecasts were higher than those of the model and SDBC. The SDDBC’s BSS increased as the effect of the increasing resolution component was greater than that of the decreasing reliability component. Therefore, it is necessary to correct the predicted temperature trend in post-processing for the output of numerical prediction models.


Introduction
Climate system models (CSMs) have become the main tool for climate prediction around the world [1][2][3]. Recently, the Beijing Climate Center (BCC) of the China Meteorological Administration (CMA) has improved the physics and resolution of its operational CSM and updated the forecast system to a second-generation climate system model (BCC_CSM) [2]. Recent studies have used the archived BCC_CSM reforecasts for different applications, such as evaluating the forecast skill of Asian-Western Pacific summer monsoon [4], Asian summer monsoon [5], Madden-Julian oscillation [6], summer precipitation [7], synoptic eddy and low-frequency flow [8], Indian Ocean basin mode and dipole mode [9], stratospheric sudden warming [10], primary East Asian summer circulation patterns [11], and winter temperature [12]. The model has shown a considerable ability to predict important climate phenomena, tropical large-scale atmospheric circulation anomalies and primary climate variability modes. However, the prediction skill of weak anomaly signals and atmospheric circulation in middle and high latitudes still needs to be improved.

Observed Data
The data used in this study were the boreal spring (March to May) mean air temperature data for the period 1991 to 2020 at 160 meteorological stations over China from the China Meteorological Administration. The spatial distribution of the 160 meteorological stations is shown in Figure 2

Observed Data
The data used in this study were the boreal spring (March to May) mean air temperature data for the period 1991 to 2020 at 160 meteorological stations over China from the China Meteorological Administration. The spatial distribution of the 160 meteorological stations is shown in Figure 2

Model Data
The BCC-CSM1.1 (m) model used in this study was developed by the Beijing Climate Center (BCC) of the China Meteorological Administration. This model consists of fully coupled components of the atmosphere, ocean, ice, and land and has been applied in research on climate change projection and climate prediction at the BCC [3]. The BCC_CSM shows a reliable performance in short-term climate prediction [4,5]. The hindcasts and forecasts of the model were initiated from the first day of each month from 1991 to 2020. In total, 24 ensemble members were used to predict the monthly average atmospheric circulation and surface climatic factors in the next 13 months, with a resolution of 1° × 1°. In this study, the spring air temperature forecasts by the model from March 1st were used. The deterministic forecasts were determined by the ensemble mean of the 24 members.

Model Data
The BCC-CSM1.1 (m) model used in this study was developed by the Beijing Climate Center (BCC) of the China Meteorological Administration. This model consists of fully coupled components of the atmosphere, ocean, ice, and land and has been applied in research on climate change projection and climate prediction at the BCC [3]. The BCC_CSM shows a reliable performance in short-term climate prediction [4,5]. The hindcasts and forecasts of the model were initiated from the first day of each month from 1991 to 2020. In total, 24 ensemble members were used to predict the monthly average atmospheric circulation and surface climatic factors in the next 13 months, with a resolution of 1 • × 1 • . In this study, the spring air temperature forecasts by the model from March 1st were used. The deterministic forecasts were determined by the ensemble mean of the 24 members. The climate state of the observation and the model is the average value from 1991 to 2010.

(a) SDBC
The spatial disaggregation and bias correction (SDBC) method [31] has two steps. In the first step, the model forecasts are interpolated to a station using inverse distance weighting (IDW). The control points' number of neighbors of IDW is 4, and the weighting function is the inverse power of the distance (a power of 2 was used in this study). In the second step, the interpolated data of the model are bias-corrected based on the station's observation data using the quantile mapping technique [33,34]. The bias-corrected data at time i at station j are calculated as follows: where F(x) and F −1 (x) denote the cumulative density function (CDF) of the data and its inverse, respectively; the subscripts f and o indicate model forecasts and observation data, respectively; and the subscript c indicates the calibration period. The cross-validation procedure is conducted by leaving the target year out when creating the CDFs of the observation data.

(b) SDDBC
Since the model cannot accurately simulate temperature trends, in order to reduce the effect of trend simulation errors on forecasts, the SDBC was improved by removing the trend of forecasts and observations ahead of bias correction and then adding the observation trend. The modified method based on SDBC is called spatial disaggregation and detrended bias correction (SDDBC). This method has four steps. In the first step, the model forecasts are interpolated to a station using IDW. In the second step, the interpolated model data and the observations are detrended (Equations (2) and (3)). In the third step, the detrended data of the model are bias-corrected using the quantile mapping technique based on the detrended data of the observation. In the final step, the observed trend is added to the bias-corrected, downscaled, and detrended model data. The SDDBC method not only corrects the mean and variance of the prediction in the probability space but also further corrects the trend. Thus, bias-corrected data at time i at station j are calculated as follows: where f (x) is the optimal trend fitting of the data, and here is the linear trend fitting based on least square; ∆x represents the data after removing the linear trend. In steps 2-4, the cross-validation procedure is conducted by leaving the target year out.

(a) RMSE
The root mean square error (RMSE) reflects the difference between forecasts and observations, with smaller values indicating better accuracy [51]. RMSE was calculated as follows [52]: where o i represents observation data, f i represents the model forecasts or model-corrected forecasts, and n is the amount of data.

(b) ACC and TCC
The anomaly correlation coefficient (ACC) reflects the similarity of anomalous spatial patterns between forecasts and observations [8]. The ACC for the year j was calculated as follows [53]: where ∆o i,j and ∆ f i,j represent the observation and forecast anomalies for year j at station i, respectively; ∆o j and ∆ f j are the spatial averages of the observation and forecast anomalies, respectively; and m is the number of stations. The temporal correlation coefficient (TCC) is used to measure the forecast skill for each station. The TCC at station i was calculated as follows [53]: where o i,j and f i,j represent the observations and forecasts for year j at station i, respectively; o i and f i are the time averages of the observations and forecasts, respectively; and n is the number of years. ACC and TCC range between −1 and 1. The closer they are to 1, the higher the forecast skill is.
(c) MSSS The mean squared skill score (MSSS) is a relative skill measure that compares model forecasts with the climatology forecast. MSSS is calculated as follows [54]: where MSE j is the mean squared error of the model forecasts, MSE cj is the mean squared error of climatology forecasts, and w j is equal to cos (θj), where θj is the latitude of station j. MSSS ranges from −∞ to 1.0, with a value of 0 indicating that the forecast has equivalent skill to climatology, negative values indicating that the forecast has less skill than climatology, and a value of 1.0 indicating a perfect forecast. MSSS j for fully cross-validated forecasts can be expanded as follows [55]: where r fxj is the product-moment correlation of the forecasts and observations at station j; x j and f j and s xj and s fj are the average value and root mean square error of observations and forecasts, respectively; and n is the number of years. The first three terms of the decomposition of MSSS j are related to phase skills (through the correlation), amplitude errors (through the ratio of the forecast to observed variances), and overall bias error of the forecasts [54].
(d) ROCSS The relative operating characteristic (ROC) is a curve that indicates the relationship between hit rate (HR) and false alarm rate (FAR), and different sorted ensemble members are used as decision thresholds [54]. This prototypical ROC is a plot of HR (ordinate) vs. FAR (abscissa). The area under the ROC curve (AUC) can be used in the calculation of a probabilistic skill score. The approximate integral AUC is calculated as follows [56]: where HR i and FAR i are the hit rate and the false alarm rate, respectively, and n is the amount of data of probability bins. The ROC skill score (ROCSS) is calculated from the AUC [53]: The Brier skill score (BSS) was employed to evaluate the skill of probabilistic forecasts in terciles (above normal, near normal, and below normal) for each station. The BSS is written as follows [53]: where BS f and BS c represent the Brier score (BS) of the forecast and climatology, respectively; BSS res and BSS rel are the resolution component and the reliability component of the BSS, respectively; and BS res and BS rel are the resolution component and the reliability component of BS, respectively. The BSS ranges between −∞ and 1.0; values of 1 indicate perfect skill and values of 0 indicate that the skill of the forecast is equivalent to climatology. Figure 3 shows the spatial distribution of spring air temperature trends from observations and the BCC_CSM. Both observed and simulated temperatures increased, but at different rates. Overall, the average rate of increase in the observed air temperature over China was 0.49 • C/decade, significantly higher than that of the simulated rate of 0.3 • C/decade. Excluding parts of South China and Southwest China, the simulated air temperature warming rates were lower than the observed rates in most regions. Larger differences were found in North China, Central China, Northwest China, and East China, ranging between 0.22 and 0.25 • C/decade. North China showed the largest difference, where the simulated trend of temperature was 0.31 • C/decade, significantly lower than the observed rate of 0.55 • C/decade. There were fewer differences in South China, Southwest China, and Northeast China, ranging between 0.09 and 0.14 • C/decade. It is suggested that the trend of increasing spring air temperature was underestimated by the BCC_CSM in most parts of China.

Air Temperature Trend
Moreover, differences in air temperature trends between the observations and the BCC_CSM results led to an annual variation in the model error. The model error of spring air temperature over China increased significantly at an average rate of 0.18 • C/decade. Warming rates higher than 0.2 • C/decade were found over North China, Central China, Northwest China, and East China. The rates over Southwest China, Northeast China, and South China were lower. Therefore, it is necessary to correct the warming trend during the post-processing of the model forecast results to reduce the model forecast error and improve the forecast efficiency.
The spatial distribution of the temperature trend of the SDBC method was consistent with that of the BCC_CSM. The trends of most stations were also below the observed rates, since SDBC did not modify the underestimated warming rate from the BCC_CSM. Meanwhile, the temperature trends from the SDDBC method were close to the observed trends. Thus, the SDDBC method effectively solved the problem of underestimation of the spring air temperature trend by the BCC_CSM.
where the simulated trend of temperature was 0.31 °C/decade, significantly lower than the observed rate of 0.55 °C/decade. There were fewer differences in South China, Southwest China, and Northeast China, ranging between 0.09 and 0.14 °C/decade. It is suggested that the trend of increasing spring air temperature was underestimated by the BCC_CSM in most parts of China. Moreover, differences in air temperature trends between the observations and the BCC_CSM results led to an annual variation in the model error. The model error of spring air temperature over China increased significantly at an average rate of 0.18 °C/decade. Warming rates higher than 0.2 °C/decade were found over North China, Central China, Northwest China, and East China. The rates over Southwest China, Northeast China, and South China were lower. Therefore, it is necessary to correct the warming trend during the post-processing of the model forecast results to reduce the model forecast error and improve the forecast efficiency.
The spatial distribution of the temperature trend of the SDBC method was consistent with that of the BCC_CSM. The trends of most stations were also below the observed rates, since SDBC did not modify the underestimated warming rate from the BCC_CSM. Meanwhile, the temperature trends from the SDDBC method were close to the observed trends. Thus, the SDDBC method effectively solved the problem of underestimation of the spring air temperature trend by the BCC_CSM.

Deterministic Evaluation of Forecast Skill
The BCC_CSM forecasts systematically underestimated the spring temperature over China and the seven sub-regions, and the bias was 3.88 °C on average over China. A larger systematic bias of over 4 °C was found over Northwest China, Northeast China, North China, and Southwest China. The greatest systematic bias was 7.26 °C in Northwest China, while the smallest was in South China with a value of 0.87 °C. SDBC and SDDBC effectively eliminated the systematic temperature biases and presented almost no bias in all seven sub-regions.

Deterministic Evaluation of Forecast Skill
(a) RMSE The BCC_CSM forecasts systematically underestimated the spring temperature over China and the seven sub-regions, and the bias was 3.88 • C on average over China. A larger systematic bias of over 4 • C was found over Northwest China, Northeast China, North China, and Southwest China. The greatest systematic bias was 7.26 • C in Northwest China, while the smallest was in South China with a value of 0.87 • C. SDBC and SDDBC effectively eliminated the systematic temperature biases and presented almost no bias in all seven sub-regions. Figure 4 shows the spatial distribution of the root mean square error (RMSE) values for three methods. The RMSE values of spring air temperature for the BCC_CSM ranged from 0.61 to 14 for three methods. The RMSE values of spring air temperature for the BCC_CSM ranged from 0.61 to 14.4 °C, averaging at 4.98 °C over the whole country. RMSE was larger in West and North China than in East and South China. The largest RMSE was detected in Northwest China (7.79 °C), followed by Southwest China (5.59 °C), and South China had the smallest error (1.78 °C). RMSE values for the SDBC and SDDBC methods ranged between 0.49 and 1.66 °C and 0.48 and 1.71 °C, respectively, averaging at 0.89 and 0.87 °C over China, respectively. The spatial distribution of the RMSE values for these two methods was very similar. Values higher than 1 °C were found in Northeast China and the western part of Northwest China, while the values ranged from 0.50 to 1.00 °C in most other areas. The RMSE values were greatly reduced by the SDBC and SDDBC methods compared with the BCC_CSM. The RMSE for the SDDBC method was smaller than that for the SDBC method. Lower RMSE values suggested a useful correction of the SDBC and SDDBC methods to the BCC_CSM. Moreover, the SDDBC method performed better than the SDBC method. (b) TCC and ACC Figure 5 shows the spatial distribution of the temporal correlation coefficient (TCC) of spring air temperature forecast for the BCC_CSM and the SDBC and SDDBC methods. The BCC_CSM was skillful, with TCC values ranging between 0.09 and 0.72 over China. The TCC in North China was significantly higher than that in South China. TCCs above 0.4 (significant at the 5% level) suggested the higher skill of the BCC_CSM in Northeast China, North China, Northwest China, North Central China, and Northeast China. In particular, the TCCs for the northern part of Northwest China, the northern part of East China, the southern part of North China, and the northern part of Central China were greater than 0.5.
The TCC for the SDBC and SDDBC methods was also obviously higher in North China than that in South China, which presented a similar spatial distribution to that for the BCC_CSM. However, the area with a TCC for SDBC above 0.4 was narrowed. Instead, the area with a TCC for SDDBC above 0.4 was expanded. Moreover, SDDBC was more (b) TCC and ACC Figure 5 shows the spatial distribution of the temporal correlation coefficient (TCC) of spring air temperature forecast for the BCC_CSM and the SDBC and SDDBC methods. The BCC_CSM was skillful, with TCC values ranging between 0.09 and 0.72 over China. The TCC in North China was significantly higher than that in South China. TCCs above 0.4 (significant at the 5% level) suggested the higher skill of the BCC_CSM in Northeast China, North China, Northwest China, North Central China, and Northeast China. In particular, the TCCs for the northern part of Northwest China, the northern part of East China, the southern part of North China, and the northern part of Central China were greater than 0.5.
The TCC for the SDBC and SDDBC methods was also obviously higher in North China than that in South China, which presented a similar spatial distribution to that for the BCC_CSM. However, the area with a TCC for SDBC above 0.4 was narrowed. Instead, the area with a TCC for SDDBC above 0.4 was expanded. Moreover, SDDBC was more skillful in the southern part of North China, the northern part of East China and the northern part of Central China, with TCCs above 0.7, while the TCC in Northeast China was smaller. In general, the narrowed area with higher TCCs for SDBC led to lower forecast skills and the expanded area with higher TCCs for SDDBC resulted in an improvement in forecast skills. skillful in the southern part of North China, the northern part of East China and the northern part of Central China, with TCCs above 0.7, while the TCC in Northeast China was smaller. In general, the narrowed area with higher TCCs for SDBC led to lower forecast skills and the expanded area with higher TCCs for SDDBC resulted in an improvement in forecast skills.  Figure 6 shows the anomaly correlation coefficients (ACCs) of spring air temperature forecasts by the BCC_CSM and the SDBC and SDDBC methods. The BCC_CSM showed statistically significant skill at the 5% level (ACC = 0.31) for the whole of China. The ACCs of SDBC and SDDBC were 0.03 lower and 0.04 higher than that of the BCC_CSM, respectively.
The greatest ACC of the BCC_CSM for temperature forecasting was found in Northeast China (0.25), followed by East China and Northwest China (0.24), and the smallest was in South China (0.15). The ACCs were significant at the 5% level in most regions over the whole country, except South China, which suggests a high forecast skill of the BCC_CSM.
SDBC had a lower ACC than the BCC_CSM did in most regions except East China, and the difference was the greatest in Northeast China, with a value of 0.09. SDDBC showed a higher ACC in most areas except Northeast China. The greatest improvement in ACC was found in South China (0.22), followed by North China, East China, Central China, Northwest China (0.11-0.12), and Southwest China (0.05). Thus, in terms of ACC, the SDBC method showed lower skill, while the SDDBC method was obviously more skillful than the BCC_CSM.   (c) MSSS The MSSS of the BCC_CSM and SDBC for spring air temperature anomaly forecasts in China was 0.18. The MSSS of SDDBC was 0.22, 22% higher than the above two methods. Figure 7 shows the MSSS distribution of the three methods used to forecast spring air temperature. The BCC_CSM was skillful, with a positive MSSS in most parts of China except Southwest and South China. The distribution of MSSS was similar to that of TCC, which showed higher skill in North China than in South China. High MSSS values were found in North China, Northwest China, Northeast China, East China, and the northern The greatest ACC of the BCC_CSM for temperature forecasting was found in Northeast China (0.25), followed by East China and Northwest China (0.24), and the smallest was in South China (0.15). The ACCs were significant at the 5% level in most regions over the whole country, except South China, which suggests a high forecast skill of the BCC_CSM. SDBC had a lower ACC than the BCC_CSM did in most regions except East China, and the difference was the greatest in Northeast China, with a value of 0.09. SDDBC showed a higher ACC in most areas except Northeast China. The greatest improvement in ACC was found in South China (0.22), followed by North China, East China, Central China, Northwest China (0.11-0.12), and Southwest China (0.05). Thus, in terms of ACC, the SDBC method showed lower skill, while the SDDBC method was obviously more skillful than the BCC_CSM.

(c) MSSS
The MSSS of the BCC_CSM and SDBC for spring air temperature anomaly forecasts in China was 0.18. The MSSS of SDDBC was 0.22, 22% higher than the above two methods. MSSS is decomposed into phase skill, amplitude error, and systematic error. The systematic error term is close to 0 without considering the drift of climate states. Thus, the MSSS is mainly determined by the phase skill term and the amplitude error term. Table 1 shows the overall average MSSS for spring temperature anomaly forecasts from 2001 to 2020 based on observations and forecasts by the BCC_CSM, SDBC, and SDDBC methods over China. The phase skill and amplitude error of SDBC both dropped by 0.1 compared with the BCC_CSM. Thus, because of the same influence of these two errors on the MSSS, the forecast skill of SDBC remained. However, the amplitude error of SDDBC increased by 0.01 and the phase skill increased by 0.05. The increase in phase skill being greater than the increase in amplitude error led to the improvement of the forecast skill of SDDBC.
A lower phase skill of SDBC was found in all seven sub-regions, while a lower amplitude error was detected in most areas, except East and South China. The decrease in phase skill being greater than the increase in amplitude error led to the reduced forecast  The MSSS spatial distribution of SDBC and SDDBC was close to that of the BCC_CSM, ranging between −0.37 and 0.44 and −0.44 and 0.73, respectively. The area of SDBC with a positive MSSS was wider than that of the BCC_CSM. The forecast skill was improved in Southwest China and the southwestern part of Northwest China because of the increased MSSS. However, the MSSS of SDBC was lower in almost all of the remaining areas compared with the BCC_CSM, especially for the MSSS turning negative from positive in South China.
The area of SDDBC with a positive MSSS was much larger than that of the BCC_CSM, while an area having no skill (MSSS < 0) was detected in South China. Except Northeast China and South China with a decreased MSSS, the SDDBC forecast was most skillful among the three methods. A 23% to 34% improvement in MSSS occurred in North China, East China, Central China, Northwest China, and Southwest China.
MSSS is decomposed into phase skill, amplitude error, and systematic error. The systematic error term is close to 0 without considering the drift of climate states. Thus, the MSSS is mainly determined by the phase skill term and the amplitude error term. Table 1 shows the overall average MSSS for spring temperature anomaly forecasts from 2001 to 2020 based on observations and forecasts by the BCC_CSM, SDBC, and SDDBC methods over China. The phase skill and amplitude error of SDBC both dropped by 0.1 compared with the BCC_CSM. Thus, because of the same influence of these two errors on the MSSS, the forecast skill of SDBC remained. However, the amplitude error of SDDBC increased by 0.01 and the phase skill increased by 0.05. The increase in phase skill being greater than the increase in amplitude error led to the improvement of the forecast skill of SDDBC. A lower phase skill of SDBC was found in all seven sub-regions, while a lower amplitude error was detected in most areas, except East and South China. The decrease in phase skill being greater than the increase in amplitude error led to the reduced forecast capability of SDBC in most sub-regions. More than half of the areas showed greater phase skill and larger amplitude error for the SDDBC method than the BCC_CSM, except Northeast, South, and Southwest China. The increase in phase skill being greater than the decrease in amplitude error led to the improvement in forecast skill of SDDBC in most sub-regions.

Probabilistic Evaluation of Forecast Skill
(a) ROC ROC and BSS were employed to evaluate the probabilistic forecast skills for the spring air temperature anomalies in China. The skills of the BCC_CSM, SDBC, and SDDBC for above-normal (AN), near-normal (NN), and below-normal (BN) forecasts were compared. Figure 8 shows the ROC diagrams for the three methods. A ROC curve that lies along the 1:1 line indicates no skill, and a curve that is far toward the upper-left corner indicates high skill. The ROC diagrams showed that, of the three methods, the skill was the best for BN forecasts, followed by AN and NN forecasts. Thus, the weak anomaly signal, which is difficult to forecast, led to the lower predictability of the model. SDDBC showed a better performance than the BCC_CSM and SDBC did for all three (AN, NN, and BN) forecasts.
were compared. Figure 8 shows the ROC diagrams for the three methods. A ROC curve that lies along the 1:1 line indicates no skill, and a curve that is far toward the upper-left corner indicates high skill. The ROC diagrams showed that, of the three methods, the skill was the best for BN forecasts, followed by AN and NN forecasts. Thus, the weak anomaly signal, which is difficult to forecast, led to the lower predictability of the model. SDDBC showed a better performance than the BCC_CSM and SDBC did for all three (AN, NN, and BN) forecasts.  Figure 9 shows the spatial distribution of the relative operating characteristic skill score (ROCSS) of the spring air temperature for three categorical probabilistic forecasts by the BCC_CSM, SDBC, and SDDBC over China. An ROCSS above 0 denotes a skillful forecast from the method for the stations. Overall, the BCC_CSM showed better skill for AN  Figure 9 shows the spatial distribution of the relative operating characteristic skill score (ROCSS) of the spring air temperature for three categorical probabilistic forecasts by the BCC_CSM, SDBC, and SDDBC over China. An ROCSS above 0 denotes a skillful forecast from the method for the stations. Overall, the BCC_CSM showed better skill for AN and BN forecasts than for NN forecasts and was more skillful for BN than AN forecasts. The ROCSS in most areas was above 0.2 for AN forecasts in China. The largest ROCSS of 0.43 was found in Central China, followed by North China, East China, and Northwest China, where the ROCSS was larger than 0.35. The ROCSS for BN forecasts was higher than 0.4 over most areas of China, except South China and parts of Southwest China. The ROCSS for NN forecasts was mostly less than 0.2 and showed no forecast skill in some areas of Northwest and Northeast China. The ROCSS spatial distributions of SDBC and SDDBC were similar to that of the BCC_CSM. SDBC was less skillful than the model for all three categorical forecasts. The site-level average ROCSS for AN, NN, and BN forecasts decreased by 0.06, 0.09, and 0.06, respectively. The areas of NN forecast having no skill (ROCSS less than 0) were wider and became non-skilled from skillful in Central China, South China, and East China. The skill of SDDBC for the three categorical forecasts was better in most areas except Northeast China. In terms of AN forecasts, the ROCSS of North China, East China, Northwest China, and Southwest China enhanced significantly by 0.05 to 0.06. For BN forecasts, the ROCSS increased by more than 0.05 in most of China, particularly increasing by 0.08 to 0.09 in Northwest and Southwest China. For NN forecasts, the non-skilled area significantly narrowed. Compared with that of SDBC, the ROCSS of SDDBC for AN, NN, and BN forecasts was improved by 0.11, 0.13, and 0.11, respectively. South China, and East China. The skill of SDDBC for the three categorical forecasts was better in most areas except Northeast China. In terms of AN forecasts, the ROCSS of North China, East China, Northwest China, and Southwest China enhanced significantly by 0.05 to 0.06. For BN forecasts, the ROCSS increased by more than 0.05 in most of China, particularly increasing by 0.08 to 0.09 in Northwest and Southwest China. For NN forecasts, the non-skilled area significantly narrowed. Compared with that of SDBC, the ROCSS of SDDBC for AN, NN, and BN forecasts was improved by 0.11, 0.13, and 0.11, respectively. (b) BSS Figure 10 shows the Brier skill score (BSS), the resolution component of the BSS (BSSres), and the reliability component of the BSS (BSSrel) for above-normal (AN), nearnormal (NN), and below-normal (BN) forecasts of spring air temperature anomaly by the BCC_CSM, SDBC, and SDDBC over China. The negative BSS values indicate that the BCC_CSM, SDBC, and SDDBC had no skill for the NN forecasts. The three methods were found to be more skillful with greater BSS values for BN forecasts than AN forecasts. SDBC showed no skill for AN forecasts and was less skillful than the BCC_CSM for BN forecasts. Based on the BSS in Figure 10, SDDBC was obviously the most skillful for both AN and BN forecasts. This result is consistent with the findings from the ROC diagram. The BSS equation was employed to evaluate the resolution and reliability of the spring air The three methods were found to be more skillful with greater BSS values for BN forecasts than AN forecasts. SDBC showed no skill for AN forecasts and was less skillful than the BCC_CSM for BN forecasts. Based on the BSS in Figure 10, SDDBC was obviously the most skillful for both AN and BN forecasts. This result is consistent with the findings from the ROC diagram. The BSS equation was employed to evaluate the resolution and reliability of the spring air temperature anomaly forecasts. Larger BSSres values suggest higher resolution, and smaller BSSrel values indicate stronger reliability. The BSSres values of the three methods for NN forecasts were close to zero, while the BSSrel values were larger than 0.05. The low resolution combined with the low reliability led to the unskillful NN forecasts by the three methods. The BSSres was larger for BN forecasts than AN forecasts by the three methods, suggesting the higher resolution for BN forecasts than for AN forecasts. The BSSrel for BN forecasts was smaller than for AN forecasts, indicating stronger reliability for BN forecasts. Therefore, the BSS for BN forecasts is greater than AN forecasts, and the three methods showed more skill for BN forecasts.
Compared with the BCC_CSM, the BSSres of SDBC for the three categorical forecasts decreased, and the BSSrel increased, resulting in lower resolution, reliability, and BSS. The BSSres and BSSrel of SDDBC for the three categorical forecasts both increased, resulting in higher resolution and lower reliability. However, the increase in BSSres was greater than that in BSSrel, so the improvement in resolution exceeded the decrease in reliability, leading to the increase in BSS. Table 2 shows the BSS for three categorical probabilistic forecasts of spring temperature by BCC_CSM, SDBC, and SDDBC over China and seven sub-regions. The BCC_CSM, SDBC, and SDDBC had no skill for NN forecasts in the seven sub-regions of China. The three methods also had no skill for the AN forecasts in Northeast China. The BCC_CSM showed the largest BSS of 0.16 for BN forecasts among the three methods in Northeast China, indicating that the skill was not improved by the correction of SDBC and SDDBC. For the other six regions, the BSS of the BCC_CSM for BN forecasts was positive. The largest BSS of 0.2 was detected in Central China, followed by North China and East China with a BSS of 0.18, suggesting better forecast skill in these areas. The BSS for AN forecasts was less than that for BN forecasts in all six regions. Northwest China had the highest BSS of 0.09 for AN forecasts, followed by North and Central China with a BSS of 0.07. There was no skill in South China because of the negative BSS. Compared with the BCC_CSM, the skill of SDBC for AN and BN forecasts decreased by between 0.03 and 0.05. SDDBC presented a larger BSS for both AN and BN forecasts and was more skillful than the BCC_CSM. The BSS for AN forecasts was improved significantly by between 0.03 and 0.04 in North China and East China, while the BSS for BN forecasts was improved significantly by between 0.04 and 0.06 in Northwest China, North China, and Central China. The lower resolution and reliability of SDBC for AN and BN forecasts in each sub-region led to the lower BSS, while the same resolution and lower reliability for NN forecasts resulted in the decreased BSS. SDDBC improved the resolution for the three categorical forecasts in all sub-regions, except Northeast China, but deteriorated the reliability in most regions. The improvement in resolution being greater than the decrease in reliability resulted in the increases in BSS. Compared with SDBC, the resolution was improved for the three categorical forecasts in each region by SDDBC, while the variation in reliability differed from area to area. This suggests that the BSS improvement of SDDBC was mainly due to the increase in resolution. Compared with the BCC_CSM, the BSSres of SDBC for the three categorical forecasts decreased, and the BSSrel increased, resulting in lower resolution, reliability, and BSS. The BSSres and BSSrel of SDDBC for the three categorical forecasts both increased, resulting in higher resolution and lower reliability. However, the increase in BSSres was greater than that in BSSrel, so the improvement in resolution exceeded the decrease in reliability, leading to the increase in BSS.

Discussion
SDBC can reduce model error by eliminating systematic bias, and this agrees with existing studies [32][33][34][35]. However, it did not improve the forecast skills. SDDBC not only retains the advantages of SDBC, but also effectively improves probabilistic and deterministic forecast skills by correcting the temperature trend bias. All the results showed that correcting temperature trend bias is urgent in the post-processing of model outputs. SDDBC performs well in forecasting climatic factors which have obvious variation trends and whose trends cannot be appropriately simulated by the model. It can be used for the forecasting of climatic factors with obvious trends, such as temperature, solar radiation, extreme temperature events, wind speed, atmospheric circulation indexes, subtropical height field, sea surface temperature, and sea ice.
Moreover, the results also showed that improving the forecasting ability of the model for the temperature trend would significantly improve the forecasting ability for seasonal temperature. Future efforts should thus focus on improving the accuracy of the model in simulating trends in climate.
In terms of the lower ACC, MSSS, ROCSS, and BSS, SDDBC was less skillful than the BCC_CSM and SDBC were in Northeast and South China. This is mainly due to the insignificant trend of spring air temperature for these two areas. Trend correction cannot improve the TCC between the BCC_CSM and the observation, and so the forecast skill drops. Therefore, SDDBC may not improve the forecast skill when the trend of climatic factors is not significant, the nonlinear characteristics of the trend are obvious, or the trend is already forecasted well by the model.

Conclusions
This study assessed the prediction skill of the BCC_CSM, SDBC, and SDDBC for spring air temperature over China by employing deterministic and probabilistic forecast verification methods from 1991 to 2020. The influence of temperature trend correction on the seasonal prediction capability of the model was analyzed. The main conclusions with discussions can be summarized as follows.
Although the BCC_CSM simulated a significant trend of increasing spring air temperature, the warming rate was obviously underestimated. SDDBC was more skillful than SDBC as it corrected the underestimated air temperature trend.
The BCC_CSM showed a severe cold bias for the spring temperature forecast in China. The RMSE was larger in the west and north than in the east and south. The results of TCC, ACC, and MSSS indicated that the BCC_CSM was skillful for spring air temperature forecast in China, and the skill was higher in the north than in the south. In terms of the probabilistic forecast, the BCC_CSM showed considerable skill in forecasting temperature and was found to be more skillful with a greater BSS for BN forecasts than for AN forecasts, while having minor skill for NN forecasts.
SDBC and SDDBC can effectively eliminate the systematic error of the model and obviously reduce the root mean square error (RMSE). Compared with the model, SDBC cannot improve the anomaly correlation coefficient (ACC) and the mean squared skill score (MSSS) of air temperature forecasts, while SDDBC performed better than the model in terms of ACC and MSSS and was also more skillful than SDBC due to the correction of the temperature trend bias, the increase in temporal correlation between the model, and the observation and the improvement in skill of the model for phase forecast, resulting in the better MSSS and ACC. The relative operating characteristic skill score (ROCSS) and the Brier skill score (BSS) of SDDBC for the three categorical forecasts were higher than those of the model, while the scores given by SDBC were lower. The improvement in resolution exceeded the decrease in reliability, leading to the increase in BSS.
Author Contributions: Conceptualization, C.D. and P.W.; methodology, C.D. and X.W.; software, C.D.; investigation, C.D. and P.W.; data curation, C.D., X.W., Z.C. and R.W.; writing-original draft preparation, C.D. and W.C.; writing-review and editing, C.D., W.C. and R.W.; visualization, C.D.; project administration, Z.C. and P.W.; funding acquisition, C.D. and X.W. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.