Smart Tangency Portfolio: Deep Reinforcement Learning for Dynamic Rebalancing and Risk–Return Trade-Off
Abstract
1. Introduction
2. Literature Review
2.1. Portfolio Optimization and Dynamic Rebalancing
2.2. Deep Reinforcement Learning
2.3. Deep Reinforcement Learning for Stock Trading and Portfolio Management
2.4. Synthesis and Research Gap
3. Problem Setup and Proposed Methodology
3.1. Portfolio Optimization Framework
3.2. Reinforcement Learning
| Algorithm 1: PPO Clip |
|
3.3. Dynamic Portfolio Optimization Using Deep Reinforcement Learning
3.3.1. Action Space
3.3.2. Observation Space
3.3.3. Trading Agent and Environment
3.4. Ensemble Model
4. Empirical Experiments
4.1. Data Collection
4.2. Benchmark Setup
- Benchmark ETFs: the out-of-sample performance of DIA and SPY during the testing period;
- Mean–Variance Tangency Portfolio: portfolio maximizing return/variance using the latest 252-day return history and re-optimized every 30 trading days;
- Mean–CVaR Tangency Portfolio: portfolio maximizing return/CVaR (at the 95% confidence level) based on the latest 252-day return history and re-optimized every 30 trading days;
- Mean–Semivariance Tangency Portfolio: portfolio maximizing return/Semivariance using the latest 252-day return history and re-optimized every 30 trading days.
4.3. Trading Agents Training and Rolling Out-of-Sample Tests
- (1)
- The risk-aversion index , which specifies the desired position on the efficient frontier for the selected optimization objective,
- (2)
- The rebalancing horizon , indicating the number of trading days until the next portfolio optimization.
4.4. Out-of-Sample Results
4.4.1. Results for the Sector ETF Universe
4.4.2. Result for the Dow Jones Industrial Average Universe
4.4.3. Statistical Inference and Model Robustness
- Test on Benchmark adjusted under HAC (Newey–West heteroskedasticity- and autocorrelation-consistent error): vs. . is calculated from rem regression and annualized by 252. and are daily OOS returns for DRL strategies and benchmark.
- Test on Sharpe ratio difference using stationary bootstrap: vs. . We test whether the strategy improves risk efficiency relative to the benchmark by evaluating Sharpe ratio difference defined as ; is Sharpe ratio calculated using Politis–Romano stationary bootstrap, which can preserve serial dependence (expected block length = 20 trading days).
4.4.4. Turnover Dynamics and Transaction Cost Impact
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A


Appendix B. Technical Indicator Feature Definitions
| Feature Set | Definition | Windows/Parameters |
|---|---|---|
| Aroon Oscillator | ||
| Awesome Oscillator | 5 and 34 | |
| Bollinger Band Upper Bound | ||
| Bollinger Band Lower Bound | ||
| Chande Momentum Oscillator | ||
| Commodity Channel Index—30 periods | ||
| Correlation Trend Indicator | ||
| Directional Index—30 periods | Wilder’s ADX (built from . In brief, is Wilder’s moving average; TA–Lib implementation is used.) | |
| Elder–Ray Index–Bull Power | 13 | |
| Elder–Ray Index–Bear Power | 13 | |
| Inertia Indicator | Trend persistence measure based on Relative Vigor Index applied to a detrended price oscillator. | |
| Kaufman’s Efficiency Ratio | ||
| Know Sure Thing | Pring defaults | |
| Linear Regression of Close (Window 10) | ||
| Log Return | 1 | |
| Moving Average Convergence Divergence | 12, 26, 9 | |
| Percentage Volume Oscillator | 12, 26 | |
| Pretty Good Oscillator | ||
| Psychological Line | ||
| Quantitative Qualitative Estimation (QQE) | RSI smoothed by EMA and bounded by ATR-based dynamic bands (standard definition). | RSI base 14–30 |
| Relative Strength Index—30 periods | ||
| Relative Vigor Index | ||
| Simple Moving Average of Close—10 periods | 10 | |
| Simple Moving Average of Close—20 periods | 20 | |
| Simple Moving Average of Close—100 periods | 100 | |
| Stochastic RSI | ||
| Super Trend Upper Bound | ||
| Super Trend Lower Bound | ||
| the Gaussian Fisher Transform Price Reversals indicator | ||
| Triple Exponential Moving Average | ||
| Volume Variation Index | ||
| Z-Score of Close Price—75 periods | ||
| Adjusted Close Price | Adjusted for splits and dividends; base series. | — |
| Percentage Change in Close Price | 1 |
Appendix C
Appendix C.1. Best Strategy Portfolio Out-of-Sample Metrics


Appendix C.2. Turnover and Transaction Cost Analysis

| 1 | https://spinningup.openai.com/en/latest/algorithms/ppo.html#id8 (accessed on 25 September 2025). |
| 2 | Since the DJIA composition changes frequently—for example, Amazon replaced Walgreens Boots Alliance in 2024 and Salesforce, Amgen, and Honeywell joined in 2020—we assume a static universe to keep the analysis consistent across the sample period. |
| 3 | The DJIA panel begins in 2005 to ensure continuous histories for all 28 retained names. With a 252-day feature look-back and rolling two-year train–validate–test design, the first ETF out-of-sample test begins in January 2007, and the first DJIA test starts in January 2011. |
| 4 | All portfolio metrics are derived from daily log returns for both the portfolio and its benchmark (SPY for the ETF universe and DIA for the DJIA universe) using functions from the PerformanceAnalytics package (version 2.0.4) in R (version 4.4.1). Annual return is calculated from the average daily return and annualized based on 252 trading days. Annual volatility is obtained from the standard deviation of daily returns, also scaled by . Value-at-Risk (VaR, 95%) is computed through historical simulation as the fifth percentile of daily returns, representing the worst expected loss on a typical day at a 95% confidence level. Conditional Value-at-Risk (CVaR, 95%) represents the mean of losses exceeding the VaR threshold, reflecting expected tail risk. Maximum drawdown measures the greatest cumulative decline from a portfolio’s peak value to its subsequent trough. Sharpe ratio is the annualized mean excess return divided by annualized volatility, assuming a zero risk-free rate. Sortino ratio refines this by dividing the annualized mean return by downside deviation, using a minimum acceptable return of zero to isolate negative volatility. Upside and downside capture ratios compare the portfolio’s returns to the benchmark’s performance during positive and negative benchmark periods, respectively, indicating relative participation in gains and protection during losses. Beta is estimated from a regression of daily portfolio returns on benchmark returns, indicating systematic market exposure. Tracking error measures the annualized standard deviation of active returns (portfolio minus benchmark). Information ratio captures risk-adjusted active performance by dividing annualized active return by tracking error. |
| 5 | The adjusted close price is used throughout the analysis as it accounts for corporate actions such as stock splits, dividends, and distributions, providing a more accurate measure of total return than the raw close price. The adjusted close price is retrieved from Yahoo! Finance. |
References
- Acero, F., Zehtabi, P., Marchesotti, N., Cashmore, M., Magazzeni, D., & Veloso, M. (2024). Deep reinforcement learning and mean-variance strategies for responsible portfolio optimization. arXiv, arXiv:2403.16667. [Google Scholar] [CrossRef]
- Achelis, S. B. (2000). Technical analysis from A to Z (2nd ed.). McGraw Hill Professional. [Google Scholar]
- Bishop, C. M. (n.d.). Neural networks for pattern recognition. Clarendon Press. [Google Scholar]
- Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees. Chapman and Hall/CRC. [Google Scholar]
- Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI gym. arXiv, arXiv:1606.01540v1. [Google Scholar] [CrossRef]
- Bühler, H., Gonon, L., Teichmann, J., & Wood, B. (2018). Deep hedging. arXiv, arXiv:1802.03042. [Google Scholar] [CrossRef]
- Ceria, S., & Stubbs, R. A. (2006). Incorporating estimation errors into portfolio selection: Robust portfolio construction. Journal of Asset Management, 7(2), 109–127. [Google Scholar] [CrossRef]
- Chang, K. C., Tian, Z., & Yu, J. (2017, July 10–13). Dynamic asset allocation—Chasing a moving target. The 2017 20th International Conference on Information Fusion (Fusion) (pp. 1–8), Xi’an, China. [Google Scholar]
- Chen, T., & Guestrin, C. (2016, August 13–17). XGBoost: A scalable tree boosting system. The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794), San Francisco, CA, USA. [Google Scholar]
- Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, 1(2), 223–236. [Google Scholar] [CrossRef]
- Dai, Z., & Wang, F. (2019). Sparse and robust mean–variance portfolio optimization problems. Physica A: Statistical Mechanics and Its Applications, 523, 1371–1378. [Google Scholar] [CrossRef]
- Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3), 653–664. [Google Scholar] [CrossRef]
- Du, J., Jin, M., Kolm, P. N., Ritter, G., Wang, Y., & Zhang, B. (2020). Deep reinforcement learning for option replication and hedging. The Journal of Financial Data Science, 2(4), 44–57. [Google Scholar] [CrossRef]
- Ekren, I., Liu, R., & Muhle-Karbe, J. (2017). Optimal rebalancing frequencies for multidimensional portfolios. Mathematics and Financial Economics, 12(2), 165–191. [Google Scholar] [CrossRef]
- Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211. [Google Scholar] [CrossRef]
- Estrada, J. (2002). Systematic risk in emerging markets: The D-CAPM. Emerging Markets Review, 3(4), 365–379. [Google Scholar] [CrossRef]
- Estrada, J. (2007). Mean-semivariance behavior: Downside risk and capital asset pricing. International Review of Economics & Finance, 16(2), 169–185. [Google Scholar] [CrossRef]
- Fakhar, M., Mahyarinia, M. R., & Zafarani, J. (2018). On nonsmooth robust multiobjective optimization under generalized convexity with applications to portfolio optimization. European Journal of Operational Research, 265(1), 39–48. [Google Scholar] [CrossRef]
- Fliege, J., & Werner, R. (2014). Robust multiobjective optimization & applications in portfolio optimization. European Journal of Operational Research, 234(2), 422–433. [Google Scholar] [CrossRef]
- Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. [Google Scholar] [CrossRef]
- Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor-critic methods. arXiv, arXiv:1802.09477. [Google Scholar] [CrossRef]
- Garlappi, L., Uppal, R., & Wang, T. (2007). Portfolio selection with parameter and model uncertainty: A multi-prior approach. The Review of Financial Studies, 20(1), 41–81. [Google Scholar] [CrossRef]
- Ghahtarani, A., Saif, A., & Ghasemi, A. (2022). Robust portfolio selection problems: A comprehensive review. Operational Research, 22(4), 3203–3264. [Google Scholar] [CrossRef]
- Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv, arXiv:1801.01290. [Google Scholar]
- Harvey, C. R., Liechty, J. C., Liechty, M. W., & Müller, P. (2010). Portfolio selection with higher moments. Quantitative Finance, 10(5), 469–485. [Google Scholar] [CrossRef]
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Hsieh, C.-H. (2021). Necessary and sufficient conditions for frequency-based Kelly optimal portfolio. IEEE Control Systems Letters, 5(1), 349–354. [Google Scholar] [CrossRef]
- Hsieh, C.-H., & Wong, Y.-S. (2023). On frequency-based optimal portfolio with transaction costs. arXiv, arXiv:2301.02754. [Google Scholar] [CrossRef]
- Hurley, W. J., & Brimberg, J. (2015). A note on the sensitivity of the strategic asset allocation problem. Operations Research Perspectives, 2, 133–136. [Google Scholar] [CrossRef][Green Version]
- JPMorgan/Reuters. (1996). RiskMetrics—Technical document (4th ed.). References—Scientific Research Publishing. Available online: https://www.scirp.org/reference/referencespapers?referenceid=2340829 (accessed on 30 September 2025).
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. [Google Scholar] [CrossRef]
- Kuhn, D., & Luenberger, D. G. (2010). Analysis of the rebalancing frequency in log-optimal portfolio selection. Quantitative Finance, 10, 221–234. [Google Scholar] [CrossRef][Green Version]
- Lee, Y., Kim, M. J., Kim, J. H., Jang, J. R., & Kim, W. C. (2020). Sparse and robust portfolio selection via semi-definite relaxation. Journal of the Operational Research Society, 71(5), 687–699. [Google Scholar] [CrossRef]
- Liang, Z., Chen, H., Zhu, J., Jiang, K., & Li, Y. (2018). Adversarial deep reinforcement learning in portfolio management. arXiv, arXiv:1808.09940. [Google Scholar] [CrossRef]
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2019). Continuous control with deep reinforcement learning. arXiv, arXiv:1509.02971. [Google Scholar]
- Liu, X.-Y., Yang, H., Chen, Q., Zhang, R., Yang, L., Xiao, B., & Wang, C. (2020). FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance. arXiv, arXiv:2011.09607. [Google Scholar] [CrossRef]
- Lo, A. W., & MacKinlay, A. C. (1988). Stock market prices do not follow random walks: Evidence from a simple specification test. The Review of Financial Studies, 1(1), 41–66. [Google Scholar] [CrossRef]
- Lobo, M. S., Fazel, M., & Boyd, S. (2007). Portfolio optimization with linear and fixed transaction costs. Annals of Operations Research, 152(1), 341–365. [Google Scholar] [CrossRef]
- Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91. [Google Scholar] [CrossRef]
- Markowitz, H. M. (1959). Portfolio selection: Efficient diversification of investments. Yale University Press. [Google Scholar]
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing atari with deep reinforcement learning. arXiv, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. [Google Scholar] [CrossRef]
- Mnih, V., Puigdomènech Badia, A., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. arXiv, arXiv:1602.01783. [Google Scholar] [CrossRef]
- Murphy, J. J. (1999). Technical analysis of the financial markets: A comprehensive guide to trading methods and applications. Penguin. [Google Scholar]
- Rockafellar, R. T., & Uryasev, S. (2000). Optimization of conditional value-at-risk. The Journal of Risk, 2(3), 21–41. [Google Scholar] [CrossRef]
- Rockafellar, R. T., & Uryasev, S. (2002). Conditional value-at-risk for general loss distributions. Journal of Banking & Finance, 26(7), 1443–1471. [Google Scholar]
- Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015). Trust region policy optimization. In Proceedings of the 32nd international conference on international conference on machine learning, Lille, France, July 6–11 (Vol. 37, pp. 1889–1897). PMLR. [Google Scholar]
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv, arXiv:1707.06347. Available online: https://www.semanticscholar.org/paper/Proximal-Policy-Optimization-Algorithms-Schulman-Wolski/dce6f9d4017b1785979e7520fd0834ef8cf02f4b (accessed on 25 September 2025). [CrossRef]
- Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2017). Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv, arXiv:1712.01815v1. [Google Scholar] [CrossRef]
- Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. A. (2014, June 21–26). Deterministic policy gradient algorithms. The 31st International Conference on International Conference on Machine Learning, Beijing, China. [Google Scholar]
- Smith, D. M., & Desormeau, W. H. (2006). Optimal rebalancing frequency for bond/stock portfolios. Journal of Financial Planning, 1919, 52–63. [Google Scholar]
- TA-Lib-Technical Analysis Library. (n.d.). Available online: https://ta-lib.org/ (accessed on 25 September 2025).
- Wang, J., Zhang, Y., Tang, K., Wu, J., & Xiong, Z. (2019, August 4–8). AlphaStock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks. The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1900–1908), Anchorage, AK, USA. [Google Scholar] [CrossRef]
- Wang, Z., Huang, B., Tu, S., Zhang, K., & Xu, L. (2021). DeepTrader: A deep reinforcement learning approach for risk-return balanced portfolio management with market conditions embedding. Proceedings of the AAAI Conference on Artificial Intelligence, 35, 643–650. [Google Scholar] [CrossRef]
- Yang, H., Liu, X.-Y., Zhong, S., & Walid, A. (2020, October 15–16). Deep reinforcement learning for automated stock trading: An ensemble strategy. The First ACM International Conference on AI in Finance (pp. 1–8), New York, NY, USA. [Google Scholar] [CrossRef]
- Ye, Y., Pei, H., Wang, B., Chen, P., Zhu, Y., Xiao, J., & Li, B. (2020). Reinforcement-learning based portfolio management with augmented asset movement prediction states. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), 1112–1119. [Google Scholar] [CrossRef]
- Yu, J., & Chang, K.-C. (2020). Neural network predictive modeling on dynamic portfolio management—A simulation-based portfolio optimization approach. Journal of Risk and Financial Management, 13(11), 285. [Google Scholar] [CrossRef]
- Zhang, Z., Zohren, S., & Roberts, S. (2019). Deep reinforcement learning for trading. arXiv, arXiv:1911.10107v1. [Google Scholar] [CrossRef]




| Feature Set: Technical Indicators | |
|---|---|
| Adjusted Close Price | Percentage Change in Adjusted Close Price |
| Aroon Oscillator | Percentage Volume Oscillator |
| Awesome Oscillator | Pretty Good Oscillator |
| Bollinger Band Lower Bound | Psychological Line |
| Bollinger Band Upper Bound | Quantitative Qualitative Estimation |
| Chande Momentum Oscillator | Relative Strength Index—30 periods |
| Commodity Channel Index—30 periods | Relative Vigor Index |
| Correlation Trend Indicator | Simple Moving Average of the Close Price—10 periods |
| Directional Index—30 periods | Simple Moving Average of the Close Price—100 periods |
| Elder-Ray Index—Bear Power | Simple Moving Average of the Close Price—20 periods |
| Elder-Ray Index—Bull Power | Stochastic RSI |
| Inertia Indicator | Super Trend Lower Bound |
| Kaufman’s efficiency ratio | Super Trend Upper Bound |
| Know Sure Thing | the Gaussian Fisher Transform Price Reversals indicator |
| Linear Regression Of Close Price With Window Size 10 | Triple Exponential Moving Average |
| Log Return | Volume Variation Index |
| Moving Average Convergence Divergence | Z-Score of Close Price—75 periods |
| ETF Tickers | Fund Name | Industry |
|---|---|---|
| IYW | iShares U.S. Technology ETF | Technology |
| IYF | iShares U.S. Financials ETF | Financials |
| IYZ | iShares U.S. Telecommunications ETF | Telecommunications |
| IYM | iShares U.S. Basic Materials ETF | Basic Materials |
| IYK | iShares U.S. Consumer Staples ETF | Consumer Staples |
| IYC | iShares U.S. Consumer Discretionary ETF | Consumer Discretionary |
| IYE | iShares U.S. Energy ETF | Energy |
| IYG | iShares U.S. Financial Services ETF | Financial Services |
| IYH | iShares U.S. Healthcare ETF | Healthcare |
| IYJ | iShares U.S. Industrials ETF | Industrials |
| IDU | iShares U.S. Utilities ETF | Utilities |
| IYR | iShares US Real Estate ETF | Real Estate |
| Stock Tickers | Company Name | Industry |
|---|---|---|
| BA | The Boeing Company | Aerospace and Defense |
| AMGN | Amgen Inc. | Biopharmaceutical |
| DIS | The Walt Disney Company | Broadcasting and entertainment |
| NKE | NIKE | Clothing industry |
| HON | Honeywell International Inc. | Conglomerate |
| MMM | 3M Company | Conglomerate |
| CAT | Caterpillar Inc. | Construction and mining |
| KO | The Coca-Cola Company | Drink industry |
| PG | The Procter & Gamble Company | Fast-moving consumer goods |
| AXP | American Express Company | Financial services |
| GS | The Goldman Sachs Group | Financial services |
| JPM | JPMorgan Chase & Co. | Financial services |
| MCD | McDonald’s Corporation | Food industry |
| HD | The Home Depot | Home Improvement |
| AAPL | Apple Inc | Information technology |
| CRM | Salesforce | Information technology |
| CSCO | Cisco Systems | Information technology |
| IBM | International Business Machines Corporation | Information technology |
| MSFT | Microsoft Corporation | Information technology |
| TRV | The Travelers Companies | Insurance |
| UNH | UnitedHealth Group Incorporated | Managed healthcare |
| CVX | Chevron Corporation | Petroleum industry |
| JNJ | Johnson & Johnson | Pharmaceutical industry |
| MRK | Merck & Co. | Pharmaceutical industry |
| AMZN | Amazon.com Inc | Retailing |
| WMT | Walmart Inc. | Retailing |
| INTC | Intel Corporation | Semiconductor industry |
| VZ | Verizon Communications Inc. | Telecommunications industry |
| Out-of-Sample Metrics | Annual Return | Annual Volatility | VaR | CVaR | Max Drawdown | Sharpe Ratio = 0%) | Sortino Ratio | Upside Capture | Downside Capture | Beta | Tracking Error | Information Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PPO Mean–Variance | 10.7% | 18.2% | −1.7% | −2.8% | 40.0% | 59.0% | 5.7% | 78.7% | 75.1% | 77.4% | 10.1% | 19.7% |
| PPO Mean–CVaR | 9.1% | 20.9% | −2.1% | −3.3% | 48.9% | 43.7% | 4.5% | 86.5% | 84.7% | 84.3% | 12.2% | 3.0% |
| PPO Mean–Semivariance | 8.3% | 17.8% | −1.7% | −2.8% | 42.6% | 46.8% | 4.7% | 76.2% | 74.6% | 76.1% | 9.8% | −4.6% |
| A2C Mean–Variance | 10.2% | 17.6% | −1.6% | −2.7% | 42.2% | 58.0% | 5.6% | 76.4% | 73.0% | 75.4% | 9.9% | 15.0% |
| A2C Mean–CVaR | 8.6% | 20.9% | −2.0% | −3.3% | 49.4% | 41.2% | 4.3% | 88.2% | 86.9% | 85.8% | 11.7% | −1.0% |
| A2C Mean–Semivariance | 8.6% | 18.6% | −1.8% | −2.9% | 45.5% | 46.1% | 4.7% | 78.9% | 77.2% | 77.1% | 10.8% | −1.8% |
| Benchmark Mean–CVaR | 2.8% | 22.5% | −2.2% | −3.6% | 72.4% | 12.6% | 2.0% | 85.8% | 89.5% | 87.7% | 13.8% | −42.8% |
| Benchmark Mean–Variance | 5.9% | 20.7% | −2.1% | −3.3% | 58.1% | 28.6% | 3.3% | 84.7% | 85.6% | 83.5% | 12.1% | −23.4% |
| Benchmark Mean–Semivariance | 6.1% | 22.9% | −2.3% | −3.7% | 58.1% | 26.6% | 3.2% | 90.0% | 90.9% | 92.6% | 12.9% | −20.7% |
| Ensemble | 11.3% | 19.1% | −1.9% | −3.0% | 41.2% | 59.0% | 5.8% | 79.3% | 75.1% | 77.6% | 11.6% | 21.9% |
| SPY Benchmark | 8.8% | 20.5% | −2.0% | −3.2% | 55.2% | 42.7% | 4.5% | 100.0% | 100.0% | 100.0% | 0.0% |
| Out-of-Sample Metrics | Annual Return | Annual Volatility | VaR | CVaR | Max Drawdown | Sharpe Ratio = 0%) | Sortino Ratio | Upside Capture | Downside Capture | Beta | Tracking Error | Information Ratio |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PPO Mean–Variance | 15.7% | 17.7% | −1.5% | −2.6% | 30.7% | 88.7% | 8.3% | 80.7% | 73.3% | 80.3% | 11.5% | 36.0% |
| PPO Mean–CVaR | 14.5% | 22.2% | −2.1% | −3.3% | 30.4% | 65.3% | 6.5% | 100.1% | 95.7% | 97.7% | 14.4% | 20.3% |
| PPO Mean–Semivariance | 10.0% | 18.0% | −1.6% | −2.7% | 33.2% | 55.7% | 5.6% | 75.8% | 73.6% | 74.8% | 13.3% | −11.5% |
| A2C Mean–Variance | 15.5% | 18.1% | −1.6% | −2.7% | 30.2% | 85.8% | 8.0% | 82.0% | 74.8% | 81.8% | 11.7% | 33.7% |
| A2C Mean–CVaR | 10.5% | 23.2% | −2.2% | −3.5% | 35.9% | 45.4% | 4.9% | 99.2% | 98.7% | 98.6% | 15.7% | −6.7% |
| A2C Mean–Semivariance | 12.0% | 18.8% | −1.7% | −2.8% | 28.1% | 63.8% | 6.3% | 80.4% | 76.6% | 78.3% | 13.6% | 3.3% |
| Benchmark Mean–CVaR | 7.0% | 21.0% | −2.0% | −3.2% | 35.8% | 33.2% | 3.7% | 86.5% | 88.7% | 85.5% | 15.2% | −30.2% |
| Benchmark Mean–Variance | 7.7% | 19.9% | −1.9% | −3.0% | 30.7% | 39.0% | 4.2% | 86.3% | 87.9% | 85.4% | 13.5% | −28.4% |
| Benchmark Mean–Semivariance | 5.8% | 22.1% | −2.1% | −3.4% | 37.6% | 26.3% | 3.2% | 88.2% | 91.8% | 89.3% | 15.9% | −36.1% |
| Ensemble | 11.4% | 17.8% | −1.8% | −2.7% | 35.9% | 64.3% | 6.2% | 80.5% | 77.5% | 76.5% | 12.6% | −1.0% |
| DIA Benchmark | 11.6% | 17.3% | −1.6% | −2.6% | 36.7% | 66.9% | 6.3% | 100.0% | 100.0% | 100.0% | 0.0% |
| Strategy | Benchmark | Number of Samples | Alpha (ann., %) | Confidence Interval—α (%) | (Units) | (Units) | ||
|---|---|---|---|---|---|---|---|---|
| Ensemble | SPY | 4032 | 4.40% | 0.095 * | [−0.77%, 9.56%] | 0.14 | 0.255 | [−0.26, 0.60] |
| Ensemble | Benchmark Mean–CVaR | 4032 | 8.69% | 0.002 *** | [3.28%, 14.10%] | 0.42 | 0.019 ** | [0.02, 0.90] |
| Ensemble | Benchmark Mean–Variance | 4032 | 6.10% | 0.008 *** | [1.56%, 10.65%] | 0.27 | 0.056 * | [−0.06, 0.65] |
| Ensemble | Benchmark Mean–Semivariance | 4032 | 6.51% | 0.008 *** | [1.71%, 11.31%] | 0.28 | 0.055 * | [−0.06, 0.69] |
| PPO Mean–Variance | SPY | 4032 | 3.75% | 0.086 * | [−0.53%, 8.03%] | 0.14 | 0.213 | [−0.21, 0.50] |
| PPO Mean–Variance | Benchmark Mean–CVaR | 4032 | 8.22% | 0.002 *** | [2.95%, 13.50%] | 0.41 | 0.023 ** | [0.00, 0.90] |
| PPO Mean–Variance | Benchmark Mean–Variance | 4032 | 5.70% | 0.009 *** | [1.45%, 9.94%] | 0.27 | 0.052 * | [−0.06, 0.65] |
| PPO Mean–Variance | Benchmark Mean–Semivariance | 4032 | 6.10% | 0.009 *** | [1.50%, 10.71%] | 0.28 | 0.057 * | [−0.07, 0.69] |
| Strategy | Benchmark | Number of Samples | Alpha (ann., %) | Confidence Interval—α (%) | (Units) | Confidence Interval—ΔSR (Units) | ||
|---|---|---|---|---|---|---|---|---|
| Ensemble | DIA | 3024 | 2.90% | 0.400 | [−3.86%, 9.67%] | −0.02 | 0.538 | [−0.63, 0.56] |
| Ensemble | Benchmark Mean–CvaR | 3024 | 6.57% | 0.043 ** | [0.20%, 12.94%] | 0.27 | 0.157 | [−0.27, 0.77] |
| Ensemble | Benchmark Mean–Variance | 3024 | 5.70% | 0.065 * | [−0.36%, 11.76%] | 0.22 | 0.190 | [−0.27, 0.70] |
| Ensemble | Benchmark Mean–Semivariance | 3024 | 7.46% | 0.024 ** | [1.00%, 13.91%] | 0.33 | 0.108 | [−0.21, 0.86] |
| PPO Mean–Variance | DIA | 3024 | 6.18% | 0.035 ** | [0.45%, 11.90%] | 0.19 | 0.206 | [−0.29, 0.66] |
| PPO Mean–Variance | Benchmark Mean–CVaR | 3024 | 10.12% | 0.000 *** | [4.56%, 15.68%] | 0.49 | 0.021 ** | [0.03, 0.98] |
| PPO Mean–Variance | Benchmark Mean–Variance | 3024 | 9.08% | 0.000 *** | [3.99%, 14.17%] | 0.44 | 0.015 ** | [0.03, 0.89] |
| PPO Mean–Variance | Benchmark Mean–Semivariance | 3024 | 10.96% | 0.000 *** | [5.33%, 16.58%] | 0.55 | 0.008 *** | [0.09, 1.05] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, J.; Chang, K.-C. Smart Tangency Portfolio: Deep Reinforcement Learning for Dynamic Rebalancing and Risk–Return Trade-Off. Int. J. Financial Stud. 2025, 13, 227. https://doi.org/10.3390/ijfs13040227
Yu J, Chang K-C. Smart Tangency Portfolio: Deep Reinforcement Learning for Dynamic Rebalancing and Risk–Return Trade-Off. International Journal of Financial Studies. 2025; 13(4):227. https://doi.org/10.3390/ijfs13040227
Chicago/Turabian StyleYu, Jiayang, and Kuo-Chu Chang. 2025. "Smart Tangency Portfolio: Deep Reinforcement Learning for Dynamic Rebalancing and Risk–Return Trade-Off" International Journal of Financial Studies 13, no. 4: 227. https://doi.org/10.3390/ijfs13040227
APA StyleYu, J., & Chang, K.-C. (2025). Smart Tangency Portfolio: Deep Reinforcement Learning for Dynamic Rebalancing and Risk–Return Trade-Off. International Journal of Financial Studies, 13(4), 227. https://doi.org/10.3390/ijfs13040227
