# A Kolmogorov-Smirnov Based Test for Comparing the Predictive Accuracy of Two Sets of Forecasts

## Abstract

## 1. Introduction

## 2. Theoretical Foundation

#### 2.1. The Kolmogorov-Smirnov (KS) Test

#### 2.2. Testing for Statistically Significant Differences between the Distribution of Two Sets of Forecast Errors

#### 2.3. Testing for the Lower Stochastic Error

## 3. Simulation Results

#### 3.1. Size of the Test

**Table 1.**Percentage of rejections of the true null hypothesis of equal prediction mean squared errors for the Diebold-Mariano test and equal distribution of squared prediction errors for the Kolmogorov-Smirnov Predictive Accuracy (KSPA) test at nominal 10% level.

h | Error Distribution | Test | n = 8 | n = 16 | n = 32 | n = 64 | n = 128 | n = 256 | n = 512 |
---|---|---|---|---|---|---|---|---|---|

1 | Gaussian | DM | 8.4 | 9.6 | 9.7 | 10.1 | 9.9 | 10.4 | 10.6 |

Gaussian | KSPA | 8.6 | 9.4 | 8.9 | 9.6 | 8.4 | 9.4 | 8.6 | |

Uniform | KSPA | 9.1 | 8.9 | 8.6 | 9.4 | 8.9 | 8.9 | 8.5 | |

Cauchy | KSPA | 9.0 | 9.1 | 8.4 | 9.2 | 8.5 | 8.9 | 8.6 | |

Student’s t | KSPA | 8.5 | 9.4 | 9.3 | 9.5 | 9.0 | 8.7 | 8.6 | |

2 | Gaussian | DM | 16.4 | 14.2 | 12.2 | 11.2 | 10.8 | 10.5 | 10.3 |

Gaussian | KSPA | 9.0 | 9.5 | 8.5 | 9.2 | 8.6 | 9.1 | 8.4 | |

Uniform | KSPA | 9.1 | 9.4 | 8.9 | 9.8 | 8.8 | 9.2 | 8.8 | |

Cauchy | KSPA | 9.3 | 9.5 | 9.0 | 9.3 | 8.8 | 9.4 | 9.0 | |

Student’s t | KSPA | 8.7 | 9.3 | 9.1 | 9.1 | 8.4 | 9.7 | 8.9 | |

3 | Gaussian | DM | 18.1 | 18.5 | 14.3 | 12.2 | 10.7 | 10.8 | 10.9 |

Gaussian | KSPA | 8.6 | 9.6 | 8.7 | 9.2 | 8.7 | 9.1 | 9.1 | |

Uniform | KSPA | 8.7 | 9.8 | 9.0 | 9.2 | 8.6 | 9.4 | 8.7 | |

Cauchy | KSPA | 8.4 | 9.4 | 9.3 | 9.7 | 8.7 | 9.5 | 8.7 | |

Student’s t | KSPA | 8.2 | 9.7 | 8.8 | 9.5 | 8.9 | 9.1 | 8.6 | |

4 | Gaussian | DM | 16.3 | 19.8 | 16.1 | 13.4 | 11.5 | 10.9 | 11.0 |

Gaussian | KSPA | 8.5 | 9.4 | 8.3 | 8.9 | 8.6 | 9.2 | 9.0 | |

Uniform | KSPA | 8.7 | 9.6 | 8.6 | 9.2 | 9.4 | 9.6 | 9.1 | |

Cauchy | KSPA | 8.4 | 9.4 | 9.0 | 9.4 | 9.6 | 9.7 | 8.7 | |

Student’s t | KSPA | 8.7 | 9.1 | 8.8 | 9.9 | 8.7 | 9.7 | 8.8 | |

5 | Gaussian | DM | 12.9 | 19.9 | 17.8 | 14.9 | 12.2 | 11.1 | 11.0 |

Gaussian | KSPA | 8.4 | 9.4 | 8.9 | 9.4 | 8.3 | 9.7 | 8.3 | |

Uniform | KSPA | 8.2 | 9.2 | 8.7 | 9.1 | 8.4 | 9.3 | 8.9 | |

Cauchy | KSPA | 8.8 | 9.6 | 8.5 | 9.5 | 9.0 | 8.8 | 8.9 | |

Student’s t | KSPA | 8.4 | 9.3 | 9.1 | 9.9 | 9.1 | 9.6 | 8.6 | |

6 | Gaussian | DM | 10.6 | 19.8 | 18.8 | 16.0 | 12.9 | 11.4 | 11.2 |

Gaussian | KSPA | 8.6 | 9.5 | 8.9 | 9.5 | 8.6 | 9.1 | 9.0 | |

Uniform | KSPA | 8.7 | 9.4 | 8.8 | 9.1 | 8.4 | 9.2 | 8.3 | |

Cauchy | KSPA | 8.9 | 9.8 | 9.1 | 9.9 | 8.5 | 9.2 | 8.6 | |

Student’s t | KSPA | 8.7 | 9.3 | 8.8 | 9.4 | 9.0 | 9.8 | 9.1 | |

7 | Gaussian | DM | 9.9 | 18.2 | 19.5 | 16.8 | 13.6 | 11.6 | 11.4 |

Gaussian | KSPA | 8.6 | 9.5 | 9.3 | 8.9 | 8.8 | 9.3 | 9.0 | |

Uniform | KSPA | 8.4 | 9.0 | 8.7 | 9.9 | 9.0 | 9.1 | 8.7 | |

Cauchy | KSPA | 8.5 | 9.2 | 8.7 | 9.1 | 9.0 | 9.4 | 8.9 | |

Student’s t | KSPA | 8.8 | 9.1 | 9.0 | 9.0 | 8.6 | 8.8 | 9.2 | |

8 | Gaussian | DM | - | 17.4 | 20.2 | 18.0 | 13.8 | 11.9 | 11.4 |

Gaussian | KSPA | - | 9.3 | 8.6 | 9.1 | 8.5 | 9.5 | 8.7 | |

Uniform | KSPA | - | 9.5 | 8.7 | 9.8 | 9.0 | 9.7 | 8.7 | |

Cauchy | KSPA | - | 9.5 | 8.3 | 9.2 | 8.8 | 8.9 | 8.9 | |

Student’s t | KSPA | - | 9.7 | 8.3 | 9.6 | 8.6 | 9.1 | 9.1 | |

9 | Gaussian | DM | - | 15.1 | 20.2 | 19.0 | 14.7 | 12.4 | 11.6 |

Gaussian | KSPA | - | 9.5 | 8.6 | 9.2 | 8.5 | 9.4 | 8.8 | |

Uniform | KSPA | - | 9.4 | 9.0 | 9.7 | 8.0 | 9.5 | 8.9 | |

Cauchy | KSPA | - | 9.8 | 8.6 | 8.9 | 8.6 | 9.4 | 8.8 | |

Student’s t | KSPA | - | 9.1 | 8.6 | 9.2 | 8.9 | 9.6 | 9.0 | |

10 | Gaussian | DM | - | 14.0 | 20.2 | 19.1 | 15.1 | 12.6 | 11.8 |

Gaussian | KSPA | - | 9.2 | 8.9 | 9.3 | 8.7 | 9.7 | 9.0 | |

Uniform | KSPA | - | 9.2 | 8.7 | 9.8 | 8.7 | 9.1 | 9.4 | |

Cauchy | KSPA | - | 9.2 | 8.8 | 9.7 | 9.1 | 9.5 | 9.3 | |

Student’s t | KSPA | - | 9.3 | 8.8 | 9.0 | 8.7 | 9.1 | 8.6 |

#### 3.2. Power of the Test

**Table 2.**Percentage of rejections of the false null hypothesis of equal one-step prediction mean squared errors for the Diebold-Mariano test and equal one-step distribution of squared prediction errors for the KSPA test at nominal 10% level.

Combinations | Test | n = 8 | n = 16 | n = 32 | n = 64 | n = 128 | n = 256 | n = 512 |
---|---|---|---|---|---|---|---|---|

Case 1 | DM | 7.3 | 17.5 | 31.9 | 37.3 | 39.3 | 40.3 | 40.9 |

KSPA | 19.6 | 35.8 | 61.0 | 91.7 | 99.9 | 100.0 | 100.0 | |

Case 2 | DM | 5.2 | 13.4 | 26.5 | 35.4 | 39.5 | 41.0 | 40.8 |

KSPA | 15.9 | 25.8 | 42.0 | 75.3 | 97.6 | 100.0 | 100.0 | |

Case 3 | DM | 59.3 | 96.0 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 |

KSPA | 65.1 | 92.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | |

Case 4 | DM | 91.6 | 99.7 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |

KSPA | 97.3 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |

^{2}distribution with 3 d.f. against errors from a χ

^{2}distribution with 10 d.f.

## 4. Empirical Evidence

#### 4.1. Scenario 1: Tourism Series

**Figure 1.**U.S. Tourist arrivals forecast, distribution of errors and empirical cumulative distribution functions (c.d.f.) of errors.

Test | Two-Sided (p-Value) | One-Sided (p-Value) |
---|---|---|

Modified DM | <0.01 * | N/A |

KSPA | <0.01 * | <0.01 * |

#### 4.2. Scenario 2: Accidental Deaths Series

Test | Two-Sided (p-Value) | One-Sided (p-Value) |
---|---|---|

DM | 0.04 * | N/A |

Modified DM | N/A | N/A |

KSPA | 0.03 * | 0.02 * |

#### 4.3. Scenario 3: Trade Series

Test | Two-Sided (p-Value) | One-Sided (p-Value) |
---|---|---|

Modified DM | 0.30 | N/A |

KSPA | 0.17 | 0.08 * |

## 5. Conclusions

## Supplementary Materials

## Appendix: R Code for the KSPA Test

# Install and load the "stats" package in R. install.packages("stats") library(stats) # Input the forecast errors from two models. Let Error1 show errors from the model with the lower error based on some loss function. Error1<-scan() Error2<-scan() # Convert the raw forecast errors into absolute values or squared values depending on the loss function. abs1<-abs(Error1) abs2<-abs(Error2) sqe1<-(Error1)^2 sqe2<-(Error2)^2 # Perform the KSPA tests for distinguishing between the predictive accuracy of forecasts from the two models*. # Two-sided KSPA test: ks.test(abs1,abs2) # One-sided KSPA test: ks.test(abs1,abs2, alternative = c("greater")) OPTIONAL GRAPHS FOR MORE INFORMATION # Draw histograms for the forecast errors from each model. par(mfrow=c(1,2)) hist(abs1, xlab="Model 1 Absolute Errors", main="") hist(abs2, xlab="Model 2 Absolute Errors",main="") # Plot the cdf of forecast errors from each model*. plot(ecdf(abs1),do.points=T,col="red",xlim=range(abs1,abs2),main="") plot(ecdf(abs2),do.points=T,col="blue",add=TRUE, main="") legend("bottomright",legend=c("Model 1 Absolute Errors","Model 2 Absolute Errors"), lty=1, col=c("red","blue")) #NOTE: *Replace abs1 and abs2 with sqe1 and sqe2 as appropriate.

## References

^{2}See [14] for the calculation and interpretation of the RRMSE criterion.^{3}Data source: http://travel.trade.gov/research/monthly/arrivals/.^{4}Data source: http://www.bea.gov/international/index.htm.

