Next Article in Journal
EXPLORIA, STEAM Education at University Level as a New Way to Teach Engineering Mechanics in an Integrated Learning Process
Previous Article in Journal
Nonlinear Dynamics of an Elastic Stop System and Its Application in a Rotor System
 
 
Article
Peer-Review Record

Random Noise vs. State-of-the-Art Probabilistic Forecasting Methods: A Case Study on CRPS-Sum Discrimination Ability

Appl. Sci. 2022, 12(10), 5104; https://doi.org/10.3390/app12105104
by Alireza Koochali 1,2,3,*, Peter Schichtel 1, Andreas Dengel 2,3 and Sheraz Ahmed 2
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(10), 5104; https://doi.org/10.3390/app12105104
Submission received: 27 April 2022 / Revised: 14 May 2022 / Accepted: 17 May 2022 / Published: 19 May 2022
(This article belongs to the Topic Machine and Deep Learning)

Round 1

Reviewer 1 Report

This paper presents a systematic evaluation of CRPS-sum to understand its possible discrimination ability on univariate and multi-variate probabilistic forecasting evaluation. A case study is conducted on exchange-rate dataset to compare the CRPS-sum, CRPS, and energy score among dummy univariate model, multivariate mode, and the baseline GP-Coupla model. Results show that the CRPS-Sum provide misleading assessment on both dummy methods.

 

This paper is well written. The paper could be accepted after making the following changes.

  1. We didn't see many papers related to probabilistic forecasting evaluation metrics, especially on multivariate probabilistic forecasting. It's interesting to see a paper challenges the newly proposed CRPS-SUM metric on multivariate probabilistic forecasting model. It would be helpful to review the probabilistic forecasting models in the literature.
  2. Probabilistic forecasts are in the forms of quantiles, distributions or scenarios.  It would be helpful to perform case studies on these three forms to prove the CRPS-SUM provide misleading assessment.
  3. The author adopted GP-Copula as baseline model since it could be validated by reference [11]. It would be helpful to provide some other probabilistic forecasting model as baseline including  parametric and non-parametric methods. In addition, different copula model may also affect the results.
  4. It would be helpful to add 1 or 2 more case study dataset to make the conclusion more persuasive.
  5. Please define GP before using the abbreviation.
  6. There are some grammar error and typos in the paper. The authors need to carefully read the paper and make corrections.

 

 

Author Response

Dear reviewer,

Thank you very much for your constructive feedback. Below, we listed a series of explanations regarding your feedback. We hope you find them useful.

  1. We didn't see many papers related to probabilistic forecasting evaluation metrics, especially on multivariate probabilistic forecasting. It's interesting to see a paper challenges the newly proposed CRPS-SUM metric on multivariate probabilistic forecasting model. It would be helpful to review the probabilistic forecasting models in the literature. 
    • As you suggested, the related work is extended in the final draft.
  2. Probabilistic forecasts are in the forms of quantiles, distributions or scenarios.  It would be helpful to perform case studies on these three forms to prove the CRPS-SUM provide misleading assessment.
    • This is a very good point and would be an excellent direction to explore in the future. However, the CRPS-Sum, used in this study, is defined for quantiles and scenarios (sample-based) and the assessment of models with direct access to distribution( explicit generative models) is a potential future direction for further investigation. All experiments in sections 4 and 5 are conducted using a sample-based method via Monte Carlo. However, in section 3.1 we illustrate that CRPS results in almost the same values using quantiles or sampling. Since the CRPS-Sum is defined based on CRPS, this is also true CRPS-Sum and that is validated empirically with our experiments as well. Furthermore, we witness this consistency in our experiments in section 5. As per the reviewer's feedback, we have updated the paper and made it more explicit.
  3. The author adopted GP-Copula as baseline model since it could be validated by reference [11]. It would be helpful to provide some other probabilistic forecasting model as baseline, including  parametric and non-parametric methods. In addition, different copula model may also affect the results.
    • The main concern in this work is studying the capabilities of the CRPS-Sum. In section 4, we provide multiple empirical and theoretical grounds for the shortcomings of this assessment method. In section 5, we tried to present an example to show that the theoretical discussions in section 4 can take place (and have already taken place) in real-world scenarios. Additionally, we aimed to employ the models which have been proposed by others and refrain to develop models ourselves to increase the credibility of our discussion, but unfortunately, we couldn't find any work which uses CRPS-Sum, and the code is publicly available. Although it is not surprising since CRPS-Sum is quite new. Finally, adding new methods is not possible considering 10 days limit to updating the paper.
  4. It would be helpful to add 1 or 2 more case study dataset to make the conclusion more persuasive.
    • We have added new datasets to the final draft as you suggested however, the reason to limit ourselves to the exchange dataset was:
      1. The small number of channels in the exchange dataset lets us illustrate the model performance on the entire test over all channels and let us compare qualitative results and quantitative results.
      2. The small number of channels in the exchange dataset let us obtain reliable values from the Energy score as the base of comparison.
    • The other datasets that GP-copula has been applied to have more than 50 channels and makes it hard to provide a meaningful discussion. 
  5. Please define GP before using the abbreviation.
    • We applied the feedback in the final draft.
  6. There are some grammar error and typos in the paper. The authors need to carefully read the paper and make corrections.
    • We applied the feedback in final draft.

                    

Best regards,
Authors

Reviewer 2 Report

This work presents a systematic evaluation of CRPS-sum to understand its discrimination ability, shows that the statistical properties of target data affect the discrimination ability of CRPS-Sum and highlights that CRPS-Sum calculation overlooks the performance of the model on each dimension. The total paper sounds very interesting and novel. However, the authors also need to respond to and revise the following questions.

  1. It is better to make a more comprehensive literature review in the form of a table (matrix) so that the reader is more confident with the contribution of this research.
  2. The performance required presenting in more quantitative manner
  3. Add more results to validate the proposed work and compared those with the existing analysis/work. Moreover, the computational effort and accuracy of the proposed work should be compared with a benchmark method and other existing work to justify its effectiveness.
  4. Explain in brief how the present paper differs from the published ones.
  5. Present the proof of sensitivity and robustness formulations/analysis of the proposed work and validation.
  6. It is necessary that the authors should illustrate/present the details of the proposed work, modeling/design and data of the studied power system, system constraints/data/parameters, etc. Moreover, state the system constraints, in other words, the upper and the lower boundaries of the optimization algorithm/system variables, etc.
  7. What are the limitations and disadvantages of the proposed work?

Author Response

Dear reviewer,

Thank you very much for your constructive feedback. Below, we listed a series of explanations regarding your feedback. We hope you find them useful.

  1. It is better to make a more comprehensive literature review in the form of a table (matrix) so that the reader is more confident with the contribution of this research.
    • The related work is extended in the final draft as you suggested.
  2. The performance required presenting in more quantitative manner
    • We studied the assessment capabilities of CRPS-Sum through comparison with energy score which is well studied and its advantages and disadvantages are clear (Listed in section 3.2). We performed our study in an extended set of configurations and scenarios. To make the quantitative results easier to digest, we present them with the relative score and tried to visualize them instead of reporting the numbers. For instance, every point in figure 2 is the quantitative result of a standalone experiment.
  3. Add more results to validate the proposed work and compared those with the existing analysis/work. Moreover, the computational effort and accuracy of the proposed work should be compared with a benchmark method and other existing work to justify its effectiveness.
    • The Term "Proposed work" hardly applies to the work presented in this paper since we did an extended study of present works and we did not propose any new assessment method. Hence, we assume that the proposed work means CRPS-Sum. With this assumption, section 5 would compare CRPS-Sum with CRPS and ES, two of the existing assessment method. Furthermore, Section 3.1 explains the computation difficulty of CRPS which applies to CRPS-Sum since CRPS-Sum is defined based on CRPS (as noted in section 3.3). We will make it more clear in the final draft.
  4. Explain in brief how the present paper differs from the published ones.
    • Section 5 compares CRPS-Sum to ES and CRPS on a real dataset while section 4 compares them on a series of toy experiments.
  5. Present the proof of sensitivity and robustness formulations/analysis of the proposed work and validation.
    • Assuming that the proposed work means CRPS-Sum, in this paper we tried to illustrate that there is a severe problem with the sensitivity and robustness of CRPS-Sum through theoretical reasoning and applied experiences.
  6. It is necessary that the authors should illustrate/present the details of the proposed work, modeling/design and data of the studied power system, system constraints/data/parameters, etc. Moreover, state the system constraints, in other words, the upper and the lower boundaries of the optimization algorithm/system variables, etc.
    • The definition of CRPS-Sum is presented in section 3.3. As explained there, CRPS-Sum is defined based on CRPS and we have discussed CRPS in section 3.1 extensively with multiple references to other studies for further information.
  7. What are the limitations and disadvantages of the proposed work?
    • Assuming that the proposed work means CRPS-Sum, section 6 provide a summary of our findings from experiments and mentions all limitation and disadvantages together.

Best regards,

Authors

Round 2

Reviewer 1 Report

The authors have satisfactorily addressed most of my concerns. This revised manuscript should be ready to go.

Back to TopTop