4.1. Simulation I: Fixed Number of Candidate Models
In this section, we conduct simulation experiments to compare the finite sample performance of our model averaging methods and some commonly used model selection and model averaging methods. For model selection, we consider three methods: AIC, BIC, and FPCA. FPCA is an efficient and common method in functional data analysis, which determines the final model by the cumulative contributions of the functional principal components. For model averaging, we consider the following methods, S-AIC (smoothed AIC), S-BIC (smoothed BIC), and our cross-validation model averaging, which is denoted as CV1 if we restrict the sum of weights to be 1 as before, and CV2 if no constraint on the sum of weights is imposed.
The data generating process is as follows: the predictor variable is
and the parameter function is
is a basis function with
is the number of the basis functions. Here, we use B-spline base and Fourier base. For B-spline base, we choose the order of the basis functions to be 2, and the number of the basis functions to be 20. As for Fourier base, we choose the number of the basis functions as 21 and the first basis to be a constant function.
In our simulation, the following four cases are considered.
- Case 1
For , are generated from the standard normal distribution ; for , . The basis functions are B-spline functions with parameters as mentioned above.
- Case 2
For , . The basis functions are B-spline functions with parameters as mentioned above.
- Case 3
For , are generated from the standard normal distribution ; for , . The basis functions are Fourier functions with parameters as mentioned above.
- Case 4
For , . The basis functions are Fourier functions with parameters as mentioned above.
We set the term to be independently generated from , where . The response variable is generated from binomial distribution with the probability being . We consider three types of link function : logistic link function , Probit link function, and Poisson link function. For the Poisson model, we only consider the simulations with for Cases 1–4.
In the simulation, we use FPCA to obtain the nested candidate models. Each candidate model contains the first principal components. The number of candidate models is 18 for Cases 1–2 and 19 for Cases 3–4. Then we adopt the weighted iterated least squares algorithm which is a common approach in generalized linear model to get the estimates for each model. For the weights, we use the ’fmincon’ function in Matlab to get the solution of CV criterion.
The sample size is set as . We use the 80% data as the training data with size , and the remaining data as the test data with size . Then, we compare the prediction errors. We calculate the prediction accuracy (), fitting accuracy (), predictor coefficient prediction accuracy ( ), and predictor coefficient fitting accuracy ( ). We repeat this process 1000 times, and then obtain mean, median, and variance of these prediction errors for each method. To save space, we present only the results on the prediction accuracy. The results on the other type accuracies are available from the authors upon request. We only report the results for logistic link function due to space limitations. Other link function results are also available from the authors.
For Case 1, the prediction errors are summarized in Table A1
, Table A2
and Table A3
. From Table A1
, it is seen that with R
varying from 1 to 10, the prediction errors are decreasing, because the difference of probability between the two groups (one group whose response is 1 and the other group whose response is 0) becomes larger. Our methods (CV1 and CV2 in the tables) always obtain the minimum error means (Mean in the tables), medians (Median in the tables), and variances (Var in the tables). However, there is no clear tendency between CV1 and CV2, which perform similarly in most of situations. When R
is small, BIC is always better than AIC, and S-BIC is always better than S-AIC. This may be due to less parameters being useful for smaller R
values, and in this case, a bigger penalty on the number of parameters in the model is preferred. Moreover, when the candidate models differ significantly, AIC or BIC performs similarly to S-AIC or S-BIC, respectively. As R
becomes larger, the difference between AIC and BIC or S-AIC and S-BIC becomes smaller. FPCA is always superior to AIC, BIC, S-AIC, and S-BIC, and their differences become larger as R
increases. Now, we turn to Table A2
and Table A3
. With the sample size n
increasing from 60 to 200 and 500, we can see that the prediction errors decrease for each fixed R
. The median and variance of prediction errors also become smaller. AIC and BIC behave increasingly similarly. CV1 and CV2 are still the best among all the methods, and followed by FPCA.
For Case 2, the prediction errors are given in Table A4
, Table A5
and Table A6
. As shown earlier, CV1 and CV2 perform the best, and followed by FPCA. Likewise, S-AIC or S-BIC is better than AIC or BIC, respectively. For Table A4
, with R
varying from 1 to 10, the prediction errors are decreasing except FPCA method, which gets the minimum at
with a small fluctuation. CV1 and CV2 perform equally well for different R
values and sample sizes. The difference between AIC and BIC becomes small with the sample size increasing. The similar phenomenon is observed for S-AIC and S-BIC.
For Case 3, the prediction errors are provided in Table A7
, Table A8
and Table A9
), CV1 or CV2 is the best when R
is between 1 and 5. However, when R
is between 6 and 10, the two model selection methods—AIC and BIC—are the best. The similar conclusions can be found in Table A8
and Table A9
, although in the latter case, CV1 actually performs the best for all of R values. The error rates of all methods become smaller with R
increasing from 1 to 6 and then bigger with R
varying from 7 to 10.
For Case 4, the prediction errors are presented in Table A10
, Table A11
and Table A12
in Table A10
, CV1, CV2, and BIC are the best, and followed by AIC. In this design, S-AIC or S-BIC is not better than AIC or BIC. For
in Table A11
, BIC is the best, and followed by AIC. For
in Table A12
, CV1 always performs the best, and followed by BIC.
In summary, for out-of-sample prediction, our methods CV1 and CV2 perform the best in most of cases and have smaller variances and medians of errors. Furthermore, CV1 and CV2 often perform equally well. This indicates that removing the restriction on the sum of weights may not lead to a better model averaging estimates.
4.2. Simulation II: Divergent Number of Candidate Models
We consider the situations where the number of candidate models tends to ∞ as the sample size increases. We set the sample size n to be 200, 400, and 1000, and the the number of candidate models to be (So M=18,36, and 90 for the three sample sizes). The data generating process is as before: the predictor variable is and the parameter function is where is a 2-order B-spline basis function, , and . For , . We set the term to be independently generated from , where . The response variable is generated from binomial distribution with the logistic link.
The candidate models are nested. The algorithms used in the calculations are the same as that described in Section 4.1
. For the simulation results, we report the errors of seven methods considered as Section 4.1
. From Table A13
, Table A14
and Table A15
, our methods—CV1 and CV2—perform the best in most of cases, and followed by FPCA, and SAIC. The difference between AIC and BIC, or SAIC and SBIC is decreasing with increasing R
4.3. Application: Beijing Second-Hand House Price Data
We apply our method to the Beijing second-hand housing transaction price data, which is captured from the internet collected by the Guoxinda Group Corporation. Most of the data pass through the manual check. This data include the second-hand housing prices and the surrounding environment variables of the 2318 residential areas in Beijing. The second-hand housing prices data are monthly data from January 2015 to December 2017 for each residential area.
Our aim is to predict the increase level in house prices in next year. We are concerned about the relationship between price level to rise and the past housing price curves. We use the median of listing online prices of houses in a residential area as the house price for this residential area. We use the price curve of each residential area from January 2015 to December 2016 as a predictor variable. The response variable is a binary variable, which takes 1 if the rising ratio is high, and 0 otherwise. Here, we define the rising ratio for each district as the ratio of the average monthly price in 2017 to the average monthly price in 2016. The 25%, 50%, and 75% quantile ratios are and , respectively. We focus on the residential areas whose housing prices are rising rapidly, and so if the ratio is higher than 75% quantile ratios of all residential areas, the response variable of this residential area takes 1 as its value, and 0 otherwise. Of the residential areas, 568 are rising fast, and 1750 are not.
For simplicity, we standardize all the price data. For each group, we plot the housing price trajectories in Figure 1
. Failure to visually detect differences between the groups could result from overcrowding of these plots with too many curves, but when displaying fewer curves (lower panels of Figure 1
), the same phenomenon remains. With a few exceptions, no clear visual differences between the two groups can be discerned. On the whole, the trajectories of per year from 2015 to 2016 are not much different. Therefore, the discrimination task at hand is difficult.
We randomly select 75% of all residential areas as the training set with size 1739, and the rest as the testing data with size 579. We use logistic link and B-spline functions to fit the house price curves. The number of the basis functions is 6, and the order of the B-spline basis functions is 2. Then, we adopt functional principal component analysis (Yao et al. 2005
) to built the data-adaptive basis functions to reduce the dimension and deal with the correlations in house price time series.
We compare the out-of-sample prediction errors of the seven methods in Section 4
. We repeat every method 20 times. The results are summarized in Table 1
and Table 2
. It can be observed from the tables that the error of CV1 or CV2 method is lower 10% on average than those of other methods, and overall, CV1 and CV2 behave similarly. As shown in the simulation above, this indicates the constraint that the sum of weights equals 1 makes sense in practical cases. AIC and BIC perform equally well, as both choose the largest model in most cases. We also find that FPCA is better than AIC or BIC. FPCA always selects the smallest model because the cumulative reliability of the first principal component is ~98%. Further, it is clear that the fitting error and prediction error of FPCA are similar. For the other methods, the fitting errors are always a little smaller than the prediction errors.