Professional Forecasters vs. Shallow Neural Network Ensembles: Assessing Inflation Prediction Accuracy
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsSee file
Comments for author File: Comments.pdf
Author Response
Dear Sir/Madam
Please see the attached update manuscript following our revisions and also our response file, thank you.
We do hope that you will find oir responses to your satisfcation
We have added an acknowledegemt to the review in footnote 1.
We look forward to hearing from you in due course.
Best wishes
Jane
Professor Jane M Binner
Chair of Finance
Department of Finance
Birmingham Business School
University of Birmingham
Edgabston
Birmingham
B15 2TY
Email j.m.binner@bham.ac.uk
Author Response File: Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsReferee report on
Professional Forecasters vs. Shallow Neural Network Ensembles: Assessing Inflation Prediction Accuracy
By Jane M. Binner, Logan J. Kelly and Jonathan A. Tepper
The manuscript is fairly impressive as it demonstrates how a neural network with a large number of parameters performs well with a fairly small training database. Forecasts made with the neural network perform well against the average forecast of the survey of professional forecasters. I have 2 minor suggestions on substance and a series of complaints about presentation.
1.The first comment on content is that the log of the cpi and the change in cpi inflation are almost redundant as the change in inflation is almost exactly
ln(cpi_t) – 2ln(cpi{t-1} + ln(cpi_{t-2}
This means that the model is identified only by functional form, which creates high risks of overfitting. I think that just ln(cpi) would work better. I actual suggest ln(cpi inflation) .
- The second comment on content is that the survey of professional forecasters contains outlying forecasts. I think that looking at the median forecast would be interesting. I absolutely do not insist on Binner, Kelly, and Tepper doing this in a revised draft (maybe to satisfy their curiosity).
My complaints about presentation are much more numerous (and include some allegations of actual mistakes in the equations or text. I am complaining about equations and notations, but will not even use and equation editor (sorry).
My alleged actual errors
I think that in equation 2, z_i is actually z_j as it is a function of of v_i which is a fuction of j. Also I think it is much better to write (1-v_j) in equation 2 rather than introducing z at all.
Second equation 2 gives c_i as a function of the jth element of a vector. I think that is just not correct. It is c_i = f(a vector) where the jth element of a vector is f(stuff in parentheses).
The authors claim that the number of lags is determined empirically. In fact they set it to 120 and don’t describe any empirical determination of that number (which is the number of months in 10 years).
I think the variable a_j should be a_i. This is my reading of the figure. I think it is output the activation, hidden unit activation or external input value which corresponds to c_i. IN the figure the weights v_j are 0.25, 0.5, 0,75 and 1 they are all multiplied by the same is output the activation, hidden unit activation or external input value a_i not a_j. I may have incorrectly guessed what is going on in equation 2 (which will be discussed in my complaints about the clarity and specificity of the presentation). The last (un-numbered) equation makes no sense to me. I expect the MSE to be a function of the observed inflation rate y_l(t). I assume that is written as o_l(l) or Yhat_l(t) but I don’t know which. In the figure the vector of forecasts made by the neural network is called yhat. In the text, the the l months ahead forecast is o(l). I would change the notation so y_l(t) appears
I think “2 to 5” should be “3 to 5”
On clarity and presentation.
First I really think it is not useful to present neural networks in general. It is not useful to present functions g() and f() without writing what functions were used. I strongly suspect that f = 0.35(sum as j goes from one to 4 ()), but I don’t really know this. I don’t even know that j is 1,2,3, or 4. The notation a_j suggests that j goes from one to (66+120*A) where A is 3,4,or 5. I am pretty sure that equation 2 should have a_i not a_j and that j goes from 1 to 4, but I don’t know this.
Binner Kelly and Tepper choose gamma= 4. I think that from then on they should write 4.
I think they should state the dimensionality of the hidden layer (50) and of the input vectors (120A) where A is the number of independent variables 3,4, or 5. I think they should state the dimensionality of c which I think is (15+15+120*A)
(and always write “where A is 3,4,or 5)
I think the sums should always be the sum of variable from 1 to Number not sum over variable (this restates the dimensionality of the vectors and makes things clearer to the reader.
There is no point in introducing the function rho then saying it is the identity, I would just delete the rho open big parenthesis and close big parenthesis.
“To facilitate convergent learning, a linearly decaying learning rate and momentum schedule are used for all training experiments with an initial learning rate of 0.0001 and momentum term of 0.95. This provides an effective time varying learning rate that guarantees convergence of stochastic approximation algorithms and has proven effective for temporal domains”
Is very interesting. I think it would be useful to make it clear that, in this context “time” refers to the time over which the computer learns not months and years over which observed inflation occurs. They might just delete the word “time” which might be confusing. Or replace it with “learning rate which declines over learning iterations” or – I give up on making it clear and have no useful suggestion.
“To minimize the onset of over-fitting, in addition to 265 the ensemble-based methodology discussed later, we apply L2 weight regularization during training with the lambda constant set at 0.001.”
Again interesting but not clear. Binner, Kelly, and Tepper can’t be expected to explain L2 weight regularization, but a citation of an article or book with an explanation would be nice.
“We generate independent training sequences directly from the time-series using a time window whose lag size is empirically established, i.e. 120 months for the US inflation problem tackled here. The MRN context units are initialized to known values at the beginning of each sequence.”
Again I object to to “empirically “. I note that empirical establishment of the time horizon would be data snooping. I guess it might mean “empirically established in the literature on inflation”. I also object to “known” I think they mean fixed or imposed. “known” implies correct.
In conclusion, I think the paper is very interesting and mainly complain about the clarity of presentation
Author Response
Dear Sir/Madam
Dear Sir/Madam
We feel the anonymous reviewer has provided very helpful and insightful comments
We have responded in full and we feel that our revised manuscript is much improced as a result of the review process
Our updated manuscript is attached.
With best wishes
Your sincerely
Jane
Professor Jane M Binner
Chair of Finance
Department of Finance
Birmingham Business School
University of Birmingham
Edgbaston
Birmingham
B15 2TY
Email j.m.binner@bham.ac.uk
Author Response File: Author Response.docx