A Study of Objective Prediction for Summer Precipitation Patterns Over Eastern China Based on a Multinomial Logistic Regression Model

The prediction of summer precipitation patterns (PPs) over eastern China is an important and topical issue in China. Predictors that are selected based on historical information may not be suitable for the future due to non-stationary relationships between summer precipitations and corresponding predictors, and might induce the instability of prediction models, especially in cases with few predictors. This study aims to investigate how to learn as much information as possible from various and numerous predictors reflecting different climate conditions. An objective prediction method based on the multinomial logistic regression (MLR) model is proposed to facilitate the study. The predictors are objectively selected from a machine learning perspective. The effectiveness of the objective prediction model is assessed by considering the influence of collinearity and number of predictors. The prediction accuracy is found to be comparable to traditionally estimated predictability, ranging between 0.6 and 0.7. The objective prediction model is capable of learning the intrinsic structure of the predictors, and is significantly superior to the prediction model with randomly-selected predictors and the single best predictor. A robust prediction can be generally obtained by learning information from plenty of predictors, although the most effective model may be constructed with fewer predictors through proper methods of predictor selection. In addition, the effectiveness of objective prediction is found to generally improve as observation increases, highlighting its potential for improvement during application as time passes.


Introduction
Summer precipitation over eastern China, a region with a densely distributed population and cultivated/industrial lands, is mainly controlled by the East Asian summer monsoon (EASM).The main belt of precipitation advances northward in a step-wise manner with the seasonally-evolving EASM during the summer time.However, various climate factors, with strong inter-annual variability, could lead to complicated regional patterns of precipitation associated with severe floods and droughts in the region [1,2].For instance, more than four thousand people were killed in the unusually extensive floods in the Yangtze River valleys in 1998, and the direct economic losses were estimated to be over $20 billion.Therefore, prediction of the summer precipitation patterns (PPs) over eastern China has been an important issue for the Chinese government [3].It helps to determine the focus of flood control and the distribution of disaster-prevention materials.The operational prediction of summer precipitation is routinely made in March by the National Climate Center of China Meteorological Administration (NCC-CMA) to meet the public service.However, although a considerable amount of research has been done on the prediction of summer precipitation in China, with both statistical and dynamical methods, the skill has remained quite limited [4][5][6].Various efforts have been continually made, but it is still hard to improve the prediction skill [7,8].The present paper introduces a novel approach to the statistical prediction of summer PPs over eastern China by using machine learning or data-driving methods.
The key to statistical prediction is the selection of skillful predictors.Many studies have shown close relationships between summer precipitation and simultaneous atmospheric circulations and oceanic signals [9,10].For instance, the western Pacific subtropical high ridge position and western ridge point were found to be well-associated with the location of the main precipitation belt [11,12]; the growing and decaying phases of El Niño and Southern Oscillation (ENSO) corresponding to different PPs in eastern China [13][14][15].However, precursors (predictors) rather than simultaneous signals are needed to make predictions.Previous studies have found several important preceding winter climate factors, or predictors, such as ENSO [16][17][18], snow cover over the Tibetan Plateau [19,20], the North Atlantic Oscillation [21,22], the North Pacific Oscillation [23], atmospheric circulation patterns over East Asia [24], and sea surface temperature over the Indo-Pacific ocean [25].Statistical models constructed based on these predictors have played an important role in past operational predictions [23].
The main problem of statistical prediction is that the effectiveness of predictors varies in different periods.For instance, the prediction models in [26] showed a 67% accuracy for the period 1989-2000, while this figure was reduced to 42% for 2001-2012 [27].This suggests that predictors selected based on physical mechanisms and/or statistical relations from historical records may not be suitable for the future due to non-stationary relationships between the climate (precipitation) and corresponding predictors.There is no perpetual dominant physical process that determines the summer PPs due to the nonlinearity and complexity of the climate system.Consequently, it is very difficult to find any persistently effective predictors, and the prediction models must be reconstructed by selecting new predictors in order to ensure a sufficient accuracy.Therefore, in this study, we propose a systematic approach to the prediction of the summer PPs over eastern China, in which the predictors are objectively selected from a machine learning perspective.This is conducted to extract as much useful information as possible from various predictors representing the relevant atmosphere, ocean, and land surface states, and to establish robust statistical relations between the preceding winter climate conditions and the summer PPs via assessing the effectiveness of the objective prediction model.
In general, summer precipitation over eastern China can be categorized into three typical PPs (Figure 1), which are characterized by different locations of the main rainfall belt [28].In the first PP, the positive anomalies of precipitation are mainly in the Yellow River valley and to the north.In the second PP, the positive anomalies of precipitation are mainly in the Huai River valley, between the Yellow River and the Yangtze River.In the third PP, the positive anomalies are mainly in the Yangtze River valley and to the south.For each summer, a specific PP category is empirically designated by NCC-CMA according to similarity with the real precipitation pattern.The three PPs generally correspond to different climate conditions and are dominated by quite distinct large-scale atmospheric circulation patterns.They are efficient and convenient in depicting summer precipitation over eastern China, and have often been used as indicators in many studies [2,29], particularly in the operational prediction of summer precipitation by NCC-CMA.It is worth noting that three indicators are well-suited for machine learning classification methods.Previous predictive studies on PPs were usually based on simple conceptual models, in which the parameters were subjectively designated [27].In the present study, the PPs are directly determined by predictors and the parameters are iteratively learned from the machine learning classification model.
The multinomial logistic regression (MLR) model, known as a generalized linear classification model, is used to classify PPs [30].Nonlinear methods are more powerful in fitting data, but they may not have advantages in dealing with short climate data because of the overfitting problem and are not involved in this study.The generalization ability of the objective prediction model is assessed in this study considering the small sample size and non-stationarity of the climate system.The article is organized as follows: the data and the machine learning method are described in Section 2; the procedures of the objective selection of predictors are introduced in Section 3; the results of training, validation, the generalization abilities of the model are analyzed in Section 4; and the main conclusions of the study are summarized in Section 5.The multinomial logistic regression (MLR) model, known as a generalized linear classification model, is used to classify PPs [30].Nonlinear methods are more powerful in fitting data, but they may not have advantages in dealing with short climate data because of the overfitting problem and are not involved in this study.The generalization ability of the objective prediction model is assessed in this study considering the small sample size and non-stationarity of the climate system.The article is organized as follows: the data and the machine learning method are described in Section 2; the procedures of the objective selection of predictors are introduced in Section 3; the results of training, validation, the generalization abilities of the model are analyzed in Section 4; and the main conclusions of the study are summarized in Section 5.

Data
The predictors are mainly taken from a monthly climate indices dataset provided by NCC-CMA (https://cmdp.ncc-cma.net/en).This dataset contains 88 atmospheric circulation indices, 26 sea surface temperature indices, and 16 other climate indices (available from https://cmdp.ncc-cma.net/Monitoring/cn_index_130.php).These 130 indices are not necessarily produced with the aim of predicting summer PPs over eastern China, but involve plenty of climate factors that reflect the global climate.An additional climate index calculated by averaging the snow depth of weather stations over the Qinghai-Tibetan Plateau is used as one of the potential predictors mentioned in the Introduction section.The median values of indices in the preceding December, and current January and February are selected to represent corresponding winter states.It should be  2, respectively.The thick black line to the north of 32 • N indicates the Yellow River and that to the south indicates the Yangtze River.The original precipitation data is available from a gridded dataset for China (http://data.cma.cn/data/detail/dataCode/SURF_CLI_CHN_PRE_MON_GRID_0.5).

Data
The predictors are mainly taken from a monthly climate indices dataset provided by NCC-CMA (https://cmdp.ncc-cma.net/en).This dataset contains 88 atmospheric circulation indices, 26 sea surface temperature indices, and 16 other climate indices (available from https://cmdp.ncc-cma.net/Monitoring/cn_index_130.php).These 130 indices are not necessarily produced with the aim of predicting summer PPs over eastern China, but involve plenty of climate factors that reflect the global climate.An additional climate index calculated by averaging the snow depth of weather stations over the Qinghai-Tibetan Plateau is used as one of the potential predictors mentioned in the Introduction section.The median values of indices in the preceding December, and current January and February are selected to represent corresponding winter states.It should be noted that spring indices usually have less correlation with summer PPs than those of the preceding winter due to transitivity of the spring season [31].The years of three PPs defined by NCC-CMA are shown in Table 1.The frequencies of the three PPs are close, with 20, 21, and 20 years, respectively, in the study period.The analyses are performed for the period of 1952-2012, for which all the data are available.Predictors with any missing values or too many equal values are removed and 84 predictors are preliminarily selected (details in Appendix A).

Multinomial Logistic Regression
The MLR model is a basic classification machine learning method for multi-class studies [30].It is a generalization of the logistic regression model [32].For a training set of m samples with k classes: x (1) , y (1) , . . ., x (i) , y (i) , . . ., x (m) , y (m) , the posterior probabilities are given by a normalized exponential form (also known as the softmax function): where x (i) denotes the ith input feature vectors (i.e., predictors) with n dimension, y (i) denotes the label of ith samples, and w denotes the parameter of the model.The hypothesis function gives the probability of a sample x i belonging to class l.The optimum parameter w can be obtained by maximizing the likelihood function, which is equivalent to minimizing the cost function based on the logarithmic likelihood function: where µ is the indicator function, so that µ y (i) = l = 1 when y (i) = l is true and µ y (i) = l = 0 when y (i) = l is false.An iterative optimization algorithm such as gradient descent can be used to solve the problem and find the minimum of J(w).To guarantee that the cost function J(w) is strictly convex, a weight decay penalty term should be added to penalize large values of the parameters.Two popular penalty terms are λ k l=1 n j=1 w 2 lj and λ k l=1 n j=1 w lj , which are known as L2 regularization and L1 regularization, respectively [30].In this study, the MLR model with L2 regularization is used to classify the three PPs classes.In addition, the MLR model with L1 regularization usually obtains sparse parameters, which means that only parts of features are used in the model.

Objective Selection of Predictors
The selection of predictors is an essential procedure in many machine learning problems, especially for the cases of multiple features and small datasets [33][34][35][36].One of the reasons for this is collinearity amongst predictors.Collinearity refers to the non-independence of predictor variables (sometimes also called multicollinearity; [37,38]).It may inflate the variance of regression parameters in parameter estimation, and lead to the instability of statistical models in predictions [39].The collinearity of 84 predictors used in this study is simply estimated by linear correlation between every two predictors.The Pearson correlation maps for 84 predictors are shown in Figure 2. It can be seen that many predictors are linearly correlated.The frequency of the Pearson correlation coefficient (CC) amounts to 34 for CCs larger than 0.9 and 19 for CCs larger than 0.95, indicating considerable collinearity amongst predictors.A basic process to eliminate highly correlated predictors is used in the study.The procedures are described as follows: 1.
Calculate the Pearson CC matrix of all predictors; 2.
Find the minimum absolute CC between predictor x i and x j in the matrix.Terminate the process if the minimum absolute CC is less than the threshold C thrd ; 3.
Calculate the average absolute CC between x i and all other predictors, and do the same with x j ; 4.
Remove the predictor with a larger average CC; 5.
procedure does not lose much predictive information, since eliminated predictors can be well-represented by at least one of the remaining predictors.Specifically, the maximum values of CCs between each removed predictor and the remaining predictors are shown to be mostly larger than 0.9 (Table 2).Above is the first step of the objective predictor selection, which aims to diminish the collinearity of predictors.The threshold C thrd is chosen dynamically to estimate the influence of collinearity on the prediction model with varying degrees.For C thrd of 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, 15, 26, 31, 41, 50, 60, and 71 predictors are retained after elimination, respectively.This procedure does not lose much predictive information, since eliminated predictors can be well-represented by at least one of the remaining predictors.Specifically, the maximum values of CCs between each removed predictor and the remaining predictors are shown to be mostly larger than 0.9 (Table 2).The second step of the objective predictor selection is to eliminate useless predictors.These predictors provide little predictive information and may raise an overfitting problem of the model.In this study, we employ three different schemes of predictor selection for a comparative analysis, in order to avoid the occasionally good performance of any single scheme and identify a better choice of predictor-selection schemes.The first scheme (single-MLR) is based on the accuracy of the MLR model built with single predictors.The second scheme (F-ratio) is based on an analysis of variance of the predictors [40].For a single predictor, the F-ratio characterizes the differences in the predictor values among the three PPs.The larger the F-ratio is, the greater the differences in predictor values are.This implies a higher skill of classifying the three PPs for a predictor with a larger F-ratio.The third scheme (L1) is based on the MLR model with L1 regularization introduced in Section 2.2.For predictors selected after step one, they are further selected through the three schemes.The number of predictors (N pre ) is chosen dynamically to search for the optimum parameters.

Training and Validation of the Objective Prediction Model
The datasets consist of 61 samples with 21 samples of the largest class.Samples are split into 10 subsets for the stratified 10-fold cross-validation (CV).The stratification here means keeping a relatively constant ratio of the three-class samples in each fold.A robust CV score, which is defined as the mean accuracy of the classification, is obtained by a 1000-times split of the samples.The baseline of the classification accuracy is naturally 1/3.Each predictor is centered and normalized before the procedures of predictor selection.The CV score of the MLR model constructed with every single predictor is tested (Figure 3) to give an overview of the effectiveness of all single predictors.Most of them are found around the baseline, with a mean value of 0.36.The third predictor, the North African-North Atlantic-North American Subtropical High Ridge Position Index, is capable of classifying the three PPs with the highest mean accuracy (0.55).It is hard to answer whether the higher scores are related to certain distinct physical mechanisms between the predictor and the PPs or not.An interpretation from the viewpoint of machine learning is that the summer PP is more likely to be a certain class if a preceding winter climate factor is under certain conditions than it is if other factors are.The situation is similar for multiple combinations of predictors.In the following study, we mainly focus on how well these climate factors tend to make correct classifications.A dynamic experiment is implemented to validate the accuracy with different C thrd s, N pre s, and predictor-selection schemes.C thrd s are chosen from 0.3 to 1, with a stride of 0.1.A C thrd of 1 means that no predictors are eliminated.N pre s are chosen from 5 to 35, with a stride of 5.The results are shown in the upper panel of Figure 4.Each color block indicates the CV score for selected predictors with fixed C thrd and N pre .Apparently, the scores vary with C thrd and N pre for all schemes, exhibiting larger values when C thrd and N pre are in certain ranges.The CV scores are generally in the range of 0.5-0.7,with the highest scores of 0.67, 0.62, and 0.71 for three schemes, respectively.The scheme L1 shows a higher CV score compared to the other two.Besides, all three schemes show generally higher CV scores than those from single predictors, indicating the higher effectiveness of MLR models with predictor-selection.The CV score here implies the accuracy of objectively recognizing PPs through preceding winter climate factors for the whole observation period.Further relationships between the CV score and different Cthrds and Npres are investigated.On average, CV scores increase with Cthrds until about 0.8 and then decrease (Figure 5a).This implies the importance of eliminating the collinearity of predictors.The optimum threshold is a compromise between the influence of collinearity and the adequacy of the predictive information.Further relationships between the CV score and different C thrd s and N pre s are investigated.On average, CV scores increase with C thrd s until about 0.8 and then decrease (Figure 5a).This implies the importance of eliminating the collinearity of predictors.The optimum threshold is a compromise between the influence of collinearity and the adequacy of the predictive information.The impact of collinearity is significant for the scheme single-MLR, since the CV score reaches its minimum when no elimination is performed (i.e., C yhrd = 1).Different behaviors on the influence of collinearity for three schemes probably arise from different processes of predictor selection in step 2. Scheme L1 tends to eliminate correlated predictors internally, and therefore diminishes the influence of collinearity in the prediction model.The predictors selected by the high score of a single predictor based on the scheme single-MLR are correlated to some extent, inducing a lower effectiveness of the prediction model.
Similar features can be found for the relationship between CV score and N pre (Figure 5b).The CV scores reach their maximum when N pre equals 15 and 20 for the schemes F-ratio and L1, respectively.For the scheme single-MLR, the CV score increases slightly after reaching a high level at N pre of 20.In addition, the CV score of the scheme L1 is significantly larger than the other two for small N pre , implying that scheme L1 is able to select more informative predictors.All these results suggest the importance of the objective selection of predictors in the prediction.The optimal prediction model can only be obtained when collinearity is properly eliminated and the numbers of predictors are properly selected.
effectiveness of the prediction model.
Similar features can be found for the relationship between CV score and Npre (Figure 5b).The CV scores reach their maximum when Npre equals 15 and 20 for the schemes F-ratio and L1, respectively.For the scheme single-MLR, the CV score increases slightly after reaching a high level at Npre of 20.In addition, the CV score of the scheme L1 is significantly larger than the other two for small Npre, implying that scheme L1 is able to select more informative predictors.All these results suggest the importance of the objective selection of predictors in the prediction.The optimal prediction model can only be obtained when collinearity is properly eliminated and the numbers of predictors are properly selected.The same as (a), but for the average scores of C thrd s in the range of 0.5-1 dependent on N pre s.
Fitting scores of the MLR model, which indicate the accuracy of training samples, are generally over 0.65, as shown in Figure 4 (lower panel) and Figure 5.The most apparent feature is that the fitting score is higher for a larger number of predictors.It is clearly shown in Figure 5b that the fitting score increases with N pre and roughly reaches its maximum for the max N pre .The C thrd does not have as much of an influence as N pre on the fitting score, except for the scheme single-MLR, for which the fitting score decreases distinctly when C thrd = 1.Histograms of the fitting score for the dynamic experiments are shown in Figure 6.On average, the CV score is not always increasing with the fitting score monotonously.The max CV score appears when the range of the fitting score is 0.8-0.9.Beyond this range, the overfitting problem is serious and the CV score decreases.Further investigation indicates that the overfitting problem mainly originates from the number of predictions.As shown in Figure 5b, the CV score does not increase after N pre of a range between 15 and 20, while the fitting score keeps increasing.The experiments indicate that the Cthrds of about 0.6-0.9 and Npres of 15-20 will generally be the optimal parameters for the objective prediction models.Based on this, a particular model (Model-opt) with Cthrd of 0.8 and Npre of 15 is chosen to investigate the details of different predictor-selection schemes.Table 3 shows 15 indices finally selected via three schemes for the The experiments indicate that the C thrd s of about 0.6-0.9 and N pre s of 15-20 will generally be the optimal parameters for the objective prediction models.Based on this, a particular model (Model-opt) with C thrd of 0.8 and N pre of 15 is chosen to investigate the details of different predictor-selection schemes.Table 3 shows 15 indices finally selected via three schemes for the Model-opt model, of which 60 predictors are retained after collinearity elimination.The indices are sorted in descending order by the importance of the predictors.Predictor index No. 3 (North African-North Atlantic-North American Subtropical High Ridge Position), 15 (Pacific Polar Vortex Intensity), and 50 (Atlantic-European Circulation E Pattern Index) are presented in all the three schemes, and the former two indices are always among the top three.The results highlight two precursors, the North African-North Atlantic-North American Subtropical High Ridge Position and the Pacific Polar Vortex Intensity, for summer PPs over eastern China.Besides, more common predictors are presented for every two schemes.These common predictors in different schemes would probably be more valuable in performing predictions.A permutation-based test is applied to evaluate the significance of the classifications.The test assesses whether the model has found a real class structure in the data.The corresponding null distribution is estimated by permuting the labels of the samples [41].The permutation test is implemented for the Model-opt model described above.First, 61 samples are constructed by randomly permuting the labels of 61 samples, and the permutation scores are then evaluated by the repeated 10-fold CV score.The permutation test is repeated 1000 times to get the null distribution.As shown in Figure 7, the CV scores of the Model-opt model are significantly larger than the permutation scores for all three schemes, implying that the model is able to reveal the intrinsic structure of the predictors corresponding to a certain class.As shown in Figure 7, the CV scores of the Model-opt model are significantly larger than the permutation scores for all three schemes, implying that the model is able to reveal the intrinsic structure of the predictors corresponding to a certain class.The effectiveness of the objective prediction model needs to be evaluated.It has already been shown that the average and maximum CV score for individual predictors are generally smaller than those of the objective prediction model.However, it is unknown whether the objective prediction model is superior to the prediction model without predictor selection.The MLR models with different combinations of random-selected predictors are implemented to facilitate a comparative analysis.The repeated 10-fold CV score is used as above.The CV scores of randomly-selected predictors with different numbers are found to be significantly lower than those of objective selection for all schemes (Figure 8).In addition, the CV scores of randomly-selected predictors generally increase with elimination of the collinearity of predictors, but still lower than those of the objective prediction model (Figure 8).This result further confirms the necessity of eliminating the collinearity of predictors and assessing the effectiveness of the objective prediction model.We also notice that the CV score roughly increases with N pre , which is different from that of the objective prediction model.The reason for this likely arises from the fact that the overfitting problem is more serious when less-informative predictors are added to the objective prediction model.Nonetheless, the MLR model is able to learn much more information from objectively-selected predictors than that from randomly-selected predictors.Moreover, even the MLR model with randomly-selected predictors can also provide useful information for predicting summer PPs, implying the learnable intrinsic relationships between global climate factors and summer PPs over eastern China.

Generalization Ability of the Objective Prediction Model
The accuracy of the prediction model may decrease in the future due to the non-stationary relationships between summer precipitations and corresponding predictors, as mentioned in the Introduction section.In climatic physics, the main reason for this arises from the fact that the predictors selected based on entire historical records will become less informative for the future period than for the past.The accuracy obtained from the above experiments would consequently be overestimated.This is similar to traditional predictions, which usually overestimate the accuracy of operational prediction.Hence, it is necessary to estimate the influence of non-stationary relationships on predictions, or in other words, estimate the generalization ability of the objective prediction model.Considering there are only a few test samples in observation, an experiment in which predictors are selected based on a random part of the records has since been implemented.All the records are treated as independent samples, and the experiment can be repeated multiple times to get a robust test score.It should be noted that this is different from the CV test in Section

Generalization Ability of the Objective Prediction Model
The accuracy of the prediction model may decrease in the future due to the non-stationary relationships between summer precipitations and corresponding predictors, as mentioned in the Introduction section.In climatic physics, the main reason for this arises from the fact that the predictors selected based on entire historical records will become less informative for the future period than for the past.The accuracy obtained from the above experiments would consequently be overestimated.This is similar to traditional predictions, which usually overestimate the accuracy of operational prediction.Hence, it is necessary to estimate the influence of non-stationary relationships on predictions, or in other words, estimate the generalization ability of the objective prediction model.Considering there are only a few test samples in observation, an experiment in which predictors are selected based on a random part of the records has since been implemented.All the records are treated as independent samples, and the experiment can be repeated multiple times to get a robust test score.It should be noted that this is different from the CV test in Section 4.1, in which the predictors are selected according to all samples and common predictors are used for all splits in 10-fold tests.The predictor-selection schemes and related parameters C thrd s and N pre s coincide with previous models.To learn as much information as possible from predictors, the training size is maximized and only three samples (one for each class) are used as test sets.Additionally, predictors are centered and normalized for the training sets and then apply the corresponding parameters to the test sets.The experiment is repeated 300 times to obtain stable scores.
The test scores and corresponding CV scores with different C thrd s and N pre s for the three schemes are shown in Figure 9. Similar to Figure 4, the test scores and CV scores also exhibit larger values in certain areas.The test scores are generally in the range of 0.4-0.55,which are smaller than the CV scores of 0.5-0.65.The difference of roughly 0.1 here suggests the influence of non-stationary relationships, as discussed above.Scheme L1 has higher test scores than the other two do, which is coincident with the results of CV scores.However, the maximum test scores of the three schemes are quite close, with values of 0.55, 0.53, and 0.55, respectively.This implies an upper limit of the generalization ability of the prediction model.The standard deviation of CV scores and test scores is ~0.04 and ~0.27, respectively, despite the different C thrd s, N pre s, and schemes.More details on test scores and corresponding CV scores varying with Cthrds and Npres are shown in Figure 10.The variations of test scores with Cthrds are coincident with those of CV scores for all three schemes.These highlight the importance of properly eliminating collinearity for prediction.The variations of test scores with Npres are complicated.For the schemes single-MLR and F-ratio, the test score increases with Npre until reaching a high level of about 0.5 for Npre larger than 20.In contrast, for scheme L1, the test scores are maintained at a high level for all Npres, but with the highest test score at Npre of 10.The results imply that a high test-score, or in other words, a robust prediction, can be generally obtained by learning information from plenty of predictors.Moreover, high test scores can also be obtained from fewer predictors through a proper method of selection (i.e., scheme L1).More details on test scores and corresponding CV scores varying with C thrd s and N pre s are shown in Figure 10.The variations of test scores with C thrd s are coincident with those of CV scores for all three schemes.These highlight the importance of properly eliminating collinearity for prediction.The variations of test scores with N pre s are complicated.For the schemes single-MLR and F-ratio, the test score increases with N pre until reaching a high level of about 0.5 for N pre larger than 20.In contrast, for scheme L1, the test scores are maintained at a high level for all N pre s, but with the highest test score at N pre of 10.The results imply that a high test-score, or in other words, a robust prediction, can be generally obtained by learning information from plenty of predictors.Moreover, high test scores can also be obtained from fewer predictors through a proper method of selection (i.e., scheme L1).
More details on test scores and corresponding CV scores varying with Cthrds and Npres are shown in Figure 10.The variations of test scores with Cthrds are coincident with those of CV scores for all three schemes.These highlight the importance of properly eliminating collinearity for prediction.The variations of test scores with Npres are complicated.For the schemes single-MLR and F-ratio, the test score increases with Npre until reaching a high level of about 0.5 for Npre larger than 20.In contrast, for scheme L1, the test scores are maintained at a high level for all Npres, but with the highest test score at Npre of 10.The results imply that a high test-score, or in other words, a robust prediction, can be generally obtained by learning information from plenty of predictors.Moreover, high test scores can also be obtained from fewer predictors through a proper method of selection (i.e., scheme L1).Relationships between test scores and corresponding CV scores are important for assessing the stability of the prediction model.As shown in Figure 11, the test scores are generally increasing with corresponding CV scores, suggesting the worth of improving the training accuracy (CV score) of prediction models.More specifically, for scheme L1, both the test score and corresponding CV score reach the maximum at N pre of 10.In comparison, for the schemes single-MLR and scheme F-ratio, there is a slight shift.The test scores reach the maximum at N pre of 30 and N pre of 25, respectively, and the corresponding CV scores at N pre of 25 and N pre of 20 (Figure 10).The prediction model with scheme L1 performs better than the other two, highlighting its application potential for the prediction of summer PPs.Relationships between test scores and corresponding CV scores are important for assessing the stability of the prediction model.As shown in Figure 11, the test scores are generally increasing with corresponding CV scores, suggesting the worth of improving the training accuracy (CV score) of prediction models.More specifically, for scheme L1, both the test score and corresponding CV score reach the maximum at Npre of 10.In comparison, for the schemes single-MLR and scheme F-ratio, there is a slight shift.The test scores reach the maximum at Npre of 30 and Npre of 25, respectively, and the corresponding CV scores at Npre of 25 and Npre of 20 (Figure 10).The prediction model with scheme L1 performs better than the other two, highlighting its application potential for the prediction of summer PPs.Finally, we try to investigate whether learned information grows with increasing observations.Experiments with different training sizes for three predictor-selection schemes are hence implemented.The Npres are chosen as 5-35 and the Cthrd is chosen as 0.8.The test size is also set to 3, and the test is repeated 300 times, as described previously.Figure 12 shows the relationships between the average test and CV scores of Npres and sample size.The test scores are found to generally increase with sample size, while the CV scores are quite smooth.The results confirm that Finally, we try to investigate whether learned information grows with increasing observations.Experiments with different training sizes for three predictor-selection schemes are hence implemented.
The N pre s are chosen as 5-35 and the C thrd is chosen as 0.8.The test size is also set to 3, and the test is repeated 300 times, as described previously.Figure 12 shows the relationships between the average test and CV scores of N pre s and sample size.The test scores are found to generally increase with sample size, while the CV scores are quite smooth.The results confirm that the effectiveness of objective prediction would improve as observation increases.Meanwhile, there probably exists an upper limit of the objective prediction model according to the smooth variations of the CV scores.

Summary and Discussion
This article presents a study of the objective prediction of summer PPs over eastern China based on the MLR model.The purpose is to investigate how to learn as much information as possible from various predictors by means of the MLR model, and based on this, to assess the effectiveness of the objective prediction model.The predictors are objectively, not limited by physical mechanisms, selected from 84 preceding winter climate factors.Three predictor-selection schemes are involved in the study.The optimal prediction model together with the influence of collinearity on predictors and numbers of predictors are estimated through varied parameters Cthrd and Npre.
The CV scores are found to be higher within certain ranges of Cthrd and Npre for all schemes, suggesting that the optimal prediction model can only be obtained when collinearity is properly eliminated, and the number of predictors is properly selected.Cthrds of about 0.6-0.9 and Npres of 15-20 are found to be roughly the optimal parameters for the objective predictions.The highest scores are comparable with traditionally-estimated upper limits of predictability with a range of 0.6-0.7 [42], reflecting the effectiveness of the objective prediction method.Moreover, the MLR model is found to be able to reveal the intrinsic structure of the predictors corresponding to a certain class and to learn much more information from objectively selected predictors than that from randomly selected predictors and a single predictor.All the results suggest the importance and effectiveness of objective selections of predictors for predictions.
The generalization ability of the objective prediction model is assessed by experiments of which predictors are selected based on a part of the records.The test scores decrease by roughly 0.1 on average compared to corresponding CV scores, suggesting the influence of non-stationary relationships between summer precipitation patterns and corresponding predictors on predictions.The results suggest that a robust prediction can be generally obtained by learning information from plenty of predictors, although the highest test score may be obtained from fewer predictors through a proper method of predictor selection.This study also implies that an upper limit of the objective prediction model probably exists, and the limit is coincident with the predictability analyzed by

Summary and Discussion
This article presents a study of the objective prediction of summer PPs over eastern China based on the MLR model.The purpose is to investigate how to learn as much information as possible from various predictors by means of the MLR model, and based on this, to assess the effectiveness of the objective prediction model.The predictors are objectively, not limited by physical mechanisms, selected from 84 preceding winter climate factors.Three predictor-selection schemes are involved in the study.The optimal prediction model together with the influence of collinearity on predictors and numbers of predictors are estimated through varied parameters C thrd and N pre .
The CV scores are found to be higher within certain ranges of C thrd and N pre for all schemes, suggesting that the optimal prediction model can only be obtained when collinearity is properly eliminated, and the number of predictors is properly selected.C thrds of about 0.6-0.9 and N pres of 15-20 are found to be roughly the optimal parameters for the objective predictions.The highest scores are comparable with traditionally-estimated upper limits of predictability with a range of 0.6-0.7 [42], reflecting the effectiveness of the objective prediction method.Moreover, the MLR model is found to be able to reveal the intrinsic structure of the predictors corresponding to a certain class and to learn much more information from objectively selected predictors than that from randomly selected predictors and a single predictor.All the results suggest the importance and effectiveness of objective selections of predictors for predictions.
The generalization ability of the objective prediction model is assessed by experiments of which predictors are selected based on a part of the records.The test scores decrease by roughly 0.1 on average compared to corresponding CV scores, suggesting the influence of non-stationary relationships between summer precipitation patterns and corresponding predictors on predictions.The results suggest that a robust prediction can be generally obtained by learning information from plenty of predictors, although the highest test score may be obtained from fewer predictors through a proper method of predictor selection.This study also implies that an upper limit of the objective prediction model probably exists, and the limit is coincident with the predictability analyzed by other studies [42].Besides, the results suggest that the effectiveness of objective prediction would generally improve as observation increases, highlighting its potential usage in the operational prediction of summer PPs.
Two nonlinear machine learning methods (random forest and multi-layer perceptron) are used to test the influence of different methods on the results with respect to objective selections of predictors (results not shown).The CV scores obtained by these two nonlinear methods are comparable to the MLR model, although the fitting scores are very high.The major problem in the prediction of summer PPs over eastern China is the shortage of observations and the influence of the non-stationarity of the climate system.Consequently, it is hard to find effective predictors, specifically, robust relationships between individual preceding winter climate factors and summer PPs.Even for predictors with reasonable physical mechanisms, the relationships would change along with climate change, let alone physical mechanisms related to other predictors, which may be dominant during the other periods.The circumstance would be worse if few predictors were selected [27].Multiple predictors objectively selected from the global climate may partly overcome this problem, and this is the most important motive for performing this study.
It is notable that the summer PPs were treated as independent samples in this study, which would overestimate the predictability to some extent.We also tested the model Model-opt with scheme L1 with training samples of the years 1952-1992, 1952-1993 . . . 1952-2011 and testing samples of 1993, 1994 . . .2012.The test score is also about 0.5, suggesting that this assumption is reasonable.On the other hand, additional signals may be found from previous summer PPs since they are essentially time-dependent.How to involve the time-dependent information in the objective prediction model is worth considering in further studies.Nevertheless, this objective approach provides a meaningful baseline for the prediction of summer PPs in eastern China.This study expands and improves the knowledge of prediction of summer PPs over eastern China, and the objective approach can also be applied for the prediction of other regional climate events.

Atmosphere 2019 , 19 Figure 1 .
Figure 1.Typical precipitation patterns of class I (left), II (middle), and III (right) over eastern China.The contoured data is the average of the percentage of anomalies of summer precipitation during 1961-2012 for each class according to Table 2, respectively.The thick black line to the north of 32 ○ N indicates the Yellow River and that to the south indicates the Yangtze River.The original precipitation data is available from a gridded dataset for China (http://data.cma.cn/data/detail/dataCode/SURF_CLI_CHN_PRE_MON_GRID_0.5).

Figure 1 .
Figure 1.Typical precipitation patterns of class I (left), II (middle), and III (right) over eastern China.The contoured data is the average of the percentage of anomalies of summer precipitation during 1961-2012 for each class according to Table 2, respectively.The thick black line to the north of 32 • N indicates the Yellow River and that to the south indicates the Yangtze River.The original precipitation data is available from a gridded dataset for China (http://data.cma.cn/data/detail/dataCode/SURF_CLI_CHN_PRE_MON_GRID_0.5).

Figure 2 .
Figure 2. Pearson correlation maps for 84 predictors selected in the study during 1952-2012.The names of predictors corresponding to the indices are listed in the appendix.

Figure 2 .
Figure 2. Pearson correlation maps for 84 predictors selected in the study during 1952-2012.The names of predictors corresponding to the indices are listed in the Appendix A.

Figure 3 .
Figure 3.The repeated 10-fold cross-validation (CV) score of the multinomial logistic regression (MLR) model using one individual predictor.The black line indicates the baseline of the classification accuracy of 0.33.The names of predictors corresponding to the indices are listed in the appendix.

Figure 3 .
Figure 3.The repeated 10-fold cross-validation (CV) score of the multinomial logistic regression (MLR) model using one individual predictor.The black line indicates the baseline of the classification accuracy of 0.33.The names of predictors corresponding to the indices are listed in the Appendix A.

Figure 3 .
Figure 3.The repeated 10-fold cross-validation (CV) score of the multinomial logistic regression (MLR) model using one individual predictor.The black line indicates the baseline of the classification accuracy of 0.33.The names of predictors corresponding to the indices are listed in the appendix.

Figure 4 .
Figure 4.The repeated 10-fold CV scores (upper panel) and fitting scores (lower panel) for different C thrd s and N pre s, and predictor-selection schemes single-MLR (left panel), F-ratio (middle panel), and L1 (right panel).

Figure 5 .
Figure 5. (a) The average repeated 10-fold CV scores (circle) and fitting scores (square) of all Npres dependent on Cthrds, for the schemes single-MLR (blue), F-ratio (black), and L1 (red), respectively.(b) The same as (a), but for the average scores of Cthrds in the range of 0.5-1 dependent on Npres.

Figure 5 .
Figure 5. (a) The average repeated 10-fold CV scores (circle) and fitting scores (square) of all N pre s dependent on C thrd s, for the schemes single-MLR (blue), F-ratio (black), and L1 (red), respectively.(b)The same as (a), but for the average scores of C thrd s in the range of 0.5-1 dependent on N pre s.

Atmosphere 2019 , 19 Figure 6 .
Figure 6.Histogram of the fitting scores with respect to CV scores (data refer to Figure 4) for three schemes, respectively.

Figure 6 .
Figure 6.Histogram of the fitting scores with respect to CV scores (data refer to Figure 4) for three schemes, respectively.

Atmosphere 2019 ,
10, x FOR PEER REVIEW 10 of 19 repeated 10-fold CV score.The permutation test is repeated 1000 times to get the null distribution.

Figure 7 .
Figure 7. Histograms of permutation scores evaluated from the objective prediction model with Cthrd of 0.8 and Npre of 15 by 1000 times for the schemes single-MLR (left panel), F-ratio (middle panel), and L1 (right panel), respectively.The red dashed line indicates the CV score of the Model-opt model.

Figure 7 .
Figure 7. Histograms of permutation scores evaluated from the objective prediction model with C thrd of 0.8 and N pre of 15 by 1000 times for the schemes single-MLR (left panel), F-ratio (middle panel), and L1 (right panel), respectively.The red dashed line indicates the CV score of the Model-opt model.

19 Figure 8 .
Figure 8.The repeated 10-fold CV scores (valid line) for MLR models with randomly-selected predictors dependent on different Npres and Cthrds.The CV scores of the objective prediction model (dashed line; refers to Figure 5) are shown for comparison.

Figure 8 .
Figure 8.The repeated 10-fold CV scores (valid line) for MLR models with randomly-selected predictors dependent on different N pre s and C thrd s.The CV scores of the objective prediction model (dashed line; refers to Figure 5) are shown for comparison.

Figure 9 .
Figure 9.The test scores (upper panel) and repeated 10-fold CV scores (lower panel) for different C thrd s and N pre s, and predictor-selection schemes single-MLR (left panel), F-ratio (middle panel), and L1 (right panel).

Figure 10 .
Figure 10.(a) The test scores (circle) and average repeated 10-fold CV scores (square) of all Npres dependent on Cthrds, for the schemes single-MLR (blue), F-ratio (black), and L1 (red).(b) The same as (a), but for the average scores of Cthrds in the range of 0.5-1 dependent on Npres.

Figure 10 .
Figure 10.(a) The test scores (circle) and average repeated 10-fold CV scores (square) of all N pre s dependent on C thrd s, for the schemes single-MLR (blue), F-ratio (black), and L1 (red).(b) The same as (a), but for the average scores of C thrd s in the range of 0.5-1 dependent on N pre s.

Figure 11 .
Figure 11.Histogram of the CV scores with respect to test scores in Figure 9 for three schemes, respectively.

Figure 11 .
Figure 11.Histogram of the CV scores with respect to test scores in Figure 9 for three schemes, respectively.

Figure 12 .
Figure 12.The average test scores (triangle) and repeated 10-fold CV scores (circle) for N pre s of 5-35 and C thrd of 0.8 dependent on sample sizes for the schemes single-MLR (blue), F-ratio (black), and L1 (red).

Table 2
, respectively.The thick black line to the north of 32 ○ N indicates the Yellow River and that to the south indicates the Yangtze River.The original precipitation data is available from a gridded dataset for China (http://data.cma.cn/data/detail/dataCode/SURF_CLI_CHN_PRE_MON_GRID_0.5).

Table 1 .
Classifications of summer precipitation pattern over eastern China from 1952-2012.

Table 2 .
The maximum of correlation coefficients (CCs) between eliminated predictors and remaining predictors.The names of predictors corresponding to the indices are listed in the appendix.

Table 2 .
The maximum of correlation coefficients (CCs) between eliminated predictors and remaining predictors.The names of predictors corresponding to the indices are listed in the Appendix A.

Table 3 .
Selected predictors sorted in descending order of the importance of the predictors via three predictor-selection schemes with C thrd of 0.8 and N pre of 15.Common predictors for three schemes are shaded in green, and gray indicates unique predictors in three schemes.The names of predictors corresponding to the indices are listed in the Appendix A.
A SSTA Index 58.NINO B SSTA Index 59. NINO Z SSTA Index 60.Tropical Northern Atlantic SST Index 61.Tropical Southern Atlantic SST Index 62. Indian Ocean Warm Pool Area Index) 63.Indian Ocean Warm Pool Strength Index 64.Western Pacific Warm Pool Area Index 65.Western Pacific Warm Pool Strength index 66.Atlantic Multi-decadal Oscillation Index 67.Oyashio Current SST Index 68.West Wind Drift Current SST Index 69.Kuroshio Current SST Index 70.ENSO Modoki Index 71.Warm-pool ENSO Index 72.Cold-tongue ENSO Index 73.Indian Ocean Basin-Wide Index 74.Tropic Indian Ocean Dipole Index 75.South Indian Ocean Dipole Index 76.Cold Air Activity Index 77.Total Sunspot Number Index 78.Southern Oscillation Index 79.Multivariate ENSO Index 80.Pacific Decadal Oscillation Index 81.Atlantic Meridional Mode SST Index 82.Quasi-Biennial Oscillation Index 83.Solar Flux Index 84.Average snow depth over Tibet Plateau.