Next Article in Journal
Determinants of the Cost of Financial Intermediation: Evidence from Emerging Economies
Previous Article in Journal
Adoption Factors of FinTech: Evidence from an Emerging Economy Country-Wide Representative Sample
 
 
Article
Peer-Review Record

Using Deep Reinforcement Learning with Hierarchical Risk Parity for Portfolio Optimization

Int. J. Financial Stud. 2023, 11(1), 10; https://doi.org/10.3390/ijfs11010010
by Adrian Millea * and Abbas Edalat
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Int. J. Financial Stud. 2023, 11(1), 10; https://doi.org/10.3390/ijfs11010010
Submission received: 10 October 2022 / Revised: 17 December 2022 / Accepted: 23 December 2022 / Published: 29 December 2022

Round 1

Reviewer 1 Report

Using Deep Reinforcement Learning with Hierarchical Risk Parity for Portfolio Optimization

 

1. INTRODUCTION

I would like to request to insert background and motivation, address gaps and laps, mention methodological and empirical contributions, policy implications, utility and applications of findings and contributions. 

 2. RELATED WORKS

clearly mentions what gaps and limitations you identified then what steps you taken to fill up those.

 

3. System architecture

what is the motivation for this methodology in particular, how it aligns to your study, pls mention these comprehensively.

 

4. Results

this section is clear and understandable

 

DISCUSSION

put a new section. validate your results. I would like to see it. compare your findings and contributions to those of relevant studies, mention your novelties.

how your findings and contributions could assist to mitigate market disaster comes from U-R conflicts and covid19.

 

5. CONCLUSION

summarize main results and findings, illustrate contributions.

comprehensively mention the policy implications, utility and application of this work, what stakeholders' benefit to study this paper, mention pls with examples (minimum 2 paragraphs).

Author Response

Thank you for your feedback and comments. Please see the answers below.

 

1. INTRODUCTION

I would like to request to insert background and motivation, address gaps and laps, mention methodological and empirical contributions, policy implications, utility and applications of findings and contributions. 

(Adding at line 44)

We would like to bridge the gap between these two quite different approaches, trying to get the best of both worlds. DRL provides flexibility and adaptability at the expense of extended computation time and large variance whereas the HC approaches provide some reliability, smaller variance and require less computation time.

One major disadvantages of the DRL approach dealing with the portfolio optimization problem is the continuous multi-dimensional action space needed to model the portfolio allocation weights, which is notoriously hard to explore. Also the historical time window fed needs to be reasonably small, the number of assets as well, such that they do not increase dimensionality of the state space too much, since many iterations are needed for a DRL algorithm to converge, with the duration of each iteration being directly dependent on the state space dimensionality.

The HC approach on the other hand is very fast but quite limited and strongly influenced by the covariance estimation period. Some HC models can produce very different results depending on this period. In our work we get the best of each approach in the following sense: i) we use a DRL agent at a higher level of decision-making, with a discrete action space, thus overcoming the continuous multi-dimensional action space exploration issue and ii) avoid the large variance by using as low-level agents HC models with different covariance estimation periods which interact with the environment. Thus, the high-level DRL agent selects among a set of low-level agents which in turn are fast HC models which give a reasonable performance. We could say we’re combining weak learners (HC models) with DRL to get a strong learner (the overall system).

By combining these two techniques in this way, the agent is able to adapt to changing market regimes, being able to deal well with bear but also bull sessions, thus enabling a much larger testing period without any retraining. It actually removes the need for retraining altogether.

 

(Adding at line 75)

Our work shows the strength of having more decision-making levels and opens up a new class of trading algorithms, in which we have specialised agents (this specificity can come from multiple sources, e.g. can be risk focused or dataset driven) which work best in their representative scenarios, and the flexibly combine those, based on another criterion, to maximise the overall profits. Numerous applications are possible, in principle any asset allocation problem can be modelled in this way, assuming the existence of good-enough low-level agents (weak learners).

This also shows a way of avoiding the cumbersome high-dimensional continuous action space. Moreover, it scales well when adding other asset classes (assuming each new class will need at least a few different base-models), adding only one discrete action to the DRL agent for each new model added. In short, we can scale our system to a very large number of assets (assuming we can partition them pertinently into different asset classes) with minimum overhead for the DRL agent.

 

 2. RELATED WORKS

clearly mentions what gaps and limitations you identified then what steps you taken to fill up those.

(Adding to line 161)

The new DRL techniques applied on the portfolio management problem seemed to be reinventing the wheel, with some approaches even incorporating covariance estimation. Most of them used some of the existing theory but none of the existing practical models. The models coming from the financial community and DRL seemed to be mutually exclusive, both tackling the same issue. You could either use one or the other. This is where we try to bridge the gap, by employing these more traditional HC models, coming from the financial community, as a starting point and adding the DRL flexibility on top of it, but avoiding its large computational requirements, large variance and instability, by not using a continuous action space and a complex state space. Our DRL works on a small number of state dimensions, the recent performance of all models, and as actions it select among available models, 12 + 1 (hold) possible discrete actions.

 

3. System architecture

what is the motivation for this methodology in particular, how it aligns to your study, pls mention these comprehensively.

(Adding at line 194)

We wanted to combine the flexibility of DRL with the efficiency and robustness of HC approaches. The motivation behind this is to harness the strengths of DRL, while overcoming its drawbacks: high-dimensional continuous action space, large state space and large variance in performance. The whole system is designed to adaptively deal with switching market regimes while avoiding the need for retraining, and it does so successfully as seen in the next section.

 

(Adding at line 204)

We first run the low-level models, the 12 HC models on the market to get their performance on the full dataset (Figure 4). We see that the performance is worse than the buy and hold strategy for most HC models. We save these models’ performances to a file (to avoid computing it

every time they are requested by the DRL agent in the training phase) and load them up in each RL experiment. These performances of all models will function as state information for the DRL agent, indexed by the step we’re at in the dataset. We tried adding also the returns of all assets, or using only the returns as a state but that produced worse performance than just using the recent performance of all models.

 

4. Results

this section is clear and understandable

 

DISCUSSION

put a new section. validate your results. I would like to see it. compare your findings and contributions to those of relevant studies, mention your novelties.

how your findings and contributions could assist to mitigate market disaster comes from U-R conflicts and covid19.

 

Discussion (new section added)

From evaluating our system without any retraining on such large testing sets, which go through different regimes of the market, we can reliably say that our system works in more market conditions. Even seminal works like [1, 2] use a few very small isolated testing sets (50 days) with retraining in between. Some do use larger testing sets of 1-2 years, but with larger training sets (3 and up to 9 times) as well [4, 5]. Only in [3] we see truly large testing sets with a similar size as the training set, but on a different set of assets (a mix of futures), and they still use retraining. We also used different markets and different dataset sizes, adding to the robustness of the evaluation. We thus add to the body of evidence that the market switches between different dynamics / regimes and having an explicit adaptive mechanism which deals with this switching is highly fruitful.

A system with more decision-making layers can include additional useful information at selected layers, thus enabling structured processing of news data or macroeconomic indicators in a more meaningful way. The Ukraine-Russia conflict and the pandemic are such examples of macro-events which would clearly benefit a trading system if incorporated soundly.

 

(These will be properly added in the text)

[1] A deep reinforcement learning framework for the financial portfolio management problem

[2] Deep direct reinforcement learning for financial signal representation and trading

[3] Deep reinforcement learning for trading

[4] Deep Robust Reinforcement Learning for Practical Algorithmic Trading

[5] An application of deep reinforcement learning to algorithmic trading

 

5. CONCLUSION

summarize main results and findings, illustrate contributions.

comprehensively mention the policy implications, utility and application of this work, what stakeholders' benefit to study this paper, mention pls with examples (minimum 2 paragraphs).

(Adding to line 289)

We devised a new architecture leveraging the flexibility of DRL and the reliability and efficiency of HC models. We avoided the notoriously hard to explore multi-dimensional action space generally used in the DRL trading literature, by having HC models which do the asset allocation and the DRL agent selects which seems to be the best one (based on recent performance) to act in the real environment.

The applications are numerous, with the system being quite flexible and open to further research. One could use various different base models or use different reward functions (or combination thereof) for the DRL. One could even include a human in the loop as an additional low-level agent. Moreover, adding or removing assets does not require DRL retraining since the HC models deal with the actual asset allocation.

Reviewer 2 Report

16. citation numbers are missing

16. there have been

48 initially designed for ?? All citations are missing and replaced as "?". Maybe I get a bad compilation

170. Table 1, ForEx is usually spelled as Forex

171 we have (a) developed

 

 

Author Response

Thank you for your comments.

The issues have been corrected.

Reviewer 3 Report

The paper applies new trading algorithms based on the concept of hierarchical risk rarity and deep reinforcement learning to different asset classes and claims superior performance of their trading architecture. The authors also claim computational superiority and longer periods out of sample performance of their trading algorithms. However, the paper needs a major revision. Here are some of my comments.

1.       Overall, the paper is poorly written and needs revision in each section, from the introduction to the conclusion. For example, the introduction should explain how the contribution of this paper fits into the existing literature and explain more clearly the contribution of the paper.

2.       The methodology section should explain more clearly how the technique is being applied.

3.       Presentation of the results can be improved significantly. For example, the authors can explain what they are presenting in each graph in Figure 4. Similarly, explain figure 5 more clearly.

4.       The authors should explain more clearly why they choose different training and testing period for different asset classes. Please ensure that this is not cherry-picking. How does the performance of these models change by changing the frequency of the data and the period considered for training and testing? Also, explain the rationality of the individual assets considered. Consider adding more texts/results to address these issue.  

5.       To improve the validity of the results, I would suggest authors consider different subsample periods, such as normal high/low volatility period, bear market period, and bull market period. Consider different time windows for the training period as well as for the testing period.

 

6.       I would suggest presenting the performance of each model in section 3.1 relative to the buy and hold portfolio. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

Thank you for your revised work.

Back to TopTop