Previous Article in Journal
Environmental Certifications as Strategic Assets? Evidence from Italian Chemical and Pharmaceutical Firms
Previous Article in Special Issue
Assessing the Environmental Impact of Fiscal Consolidation in OECD Countries: Evidence from the Panel QARDL Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sustainable vs. Non-Sustainable Assets: A Deep Learning-Based Dynamic Portfolio Allocation Strategy

by
Fatma Ben Hamadou
* and
Mouna Boujelbène Abbes
Laboratory LEG, Faculty of Economics and Management of Sfax, Sfax University, Sfax 3018, Tunisia
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2025, 18(10), 563; https://doi.org/10.3390/jrfm18100563
Submission received: 1 August 2025 / Revised: 23 September 2025 / Accepted: 1 October 2025 / Published: 3 October 2025
(This article belongs to the Special Issue Sustainable Finance for Fair Green Transition)

Abstract

This article aims to investigate the impact of sustainable assets on dynamic portfolio optimization under varying levels of investor risk aversion, particularly during turbulent market conditions. The analysis compares the performance of two portfolio types: (i) portfolios composed of non-sustainable assets such as fossil energy commodities and conventional equity indices, and (ii) mixed portfolios that combine non-sustainable and sustainable assets, including renewable energy, green bonds, and precious metals using advanced Deep Reinforcement Learning models (including TD3 and DDPG) based on risk and transaction cost- sensitive in portfolio optimization against the traditional Mean-Variance model. Results show that incorporating clean and sustainable assets significantly enhances portfolio returns and reduces volatility across all risk aversion profiles. Moreover, the Deep Reinforcing Learning optimization models outperform classical MV optimization, and the RTC-LSTM-TD3 optimization strategy outperforms all others. The RTC-LSTM-TD3 optimization achieves an annual return of 24.18% and a Sharpe ratio of 2.91 in mixed portfolios (sustainable and non-sustainable assets) under low risk aversion (λ = 0.005), compared to a return of only 8.73% and a Sharpe ratio of 0.67 in portfolios excluding sustainable assets. To the best of the authors’ knowledge, this is the first study that employs the DRL framework integrating risk sensitivity and transaction costs to evaluate the diversification benefits of sustainable assets. Findings offer important implications for portfolio managers to leverage the benefits of sustainable diversification, and for policymakers to encourage the integration of sustainable assets, while addressing fiduciary responsibilities.

1. Introduction

Today, the transition to a low-carbon economy is no longer simply an environmental aspiration; it is increasingly taking on a structural dimension in global financial markets. Faced with the climate emergency, investors and regulators are placing increasing emphasis on sustainability and socially responsible investments (SRI), given their key role in long-term environmental and economic performance. This alarming situation has encouraged the emergence of a sustainable financial ecosystem, aligned with the United Nations Sustainable Development Goals (SDGs), through various instruments such as green bonds, clean energy assets, and socially responsible investments (Naqvi et al., 2022).
However, the ESG (Environmental, Social, and Governance) investing landscape has become increasingly controversial in recent years, with growing criticism regarding greenwashing, underperformance relative to traditional assets, and potential mismatch with investors’ return expectations (Edmans, 2023; Bolton & Kacperczyk, 2023). Recent studies have dismissed the concept of green investing, arguing that sustainable assets may be marginally profitable, or even profitable, and that combining them with non-sustainable assets could dilute overall portfolio returns, raising concerns about fiduciary duties (Raghunandan & Rajgopal, 2022).
Portfolio managers are always looking for diversification opportunities to minimize risk and enhance returns. This diversification engages assets that offer attractive returns and exhibit low or even negative co-movements with portfolio securities. While investors have traditionally turned to commodities for diversification over the past few decades, environmental risks are of growing concern (Ustaoglu et al., 2021; H. Liu et al., 2021; Y. Zhang et al., 2021; W. Liu et al., 2025). According to recent research, incorporating green assets into portfolios might improve performance and minimize risks, especially during times of crisis (Miralles-Quirós et al., 2019; Kuang, 2021; Hamadou et al., 2024; Mezghani et al., 2024, 2025).
In recent years, the financial market has been marked by extreme events. These events are due to successive crises, such as the health crises of the COVID-19 pandemic, geopolitical crises such as the conflict in Ukraine and the Israeli-Palestinian war, or political crises such as the tensions under the Trump presidency. All these events affected the co-movements between financial assets (Ben Hamadou et al., 2024, 2025; Elsayed et al., 2022), making it difficult to adopt traditional portfolio optimization of Markowitz’s modern portfolio theory. One of the main challenges in applying the mean-variance model is the nonlinearity and nonstationarity of real financial returns, a challenge that research has attempted to address.
Initially, machine learning (ML) and deep learning (DL) offered promising solutions for portfolio optimization. Recent research has demonstrated that the use of predictive models using machine learning and deep learning leads to better-performing portfolios (Ma et al., 2021; Martínez-Barbero et al., 2025). However, using expected returns in portfolio optimization does not reflect the nature of investment decision-making, where market perceptions and actions are intrinsically linked. More importantly, these models fail to integrate the risk preferences of investors into the learning process.
To address this major challenge, deep reinforcement learning (DRL) is emerging as a revolutionary new method for portfolio optimization. The Deep Reinforcement Learning (DRL) model combines the strengths of Deep Learning (DL) and Reinforcement Learning (RL) for better portfolio optimization in a financial environment characterized by strong turbulence. This model enables the development of robust and dynamic strategies for complex markets by automatically integrating investors’ risk aversion and transaction cost constraints through an adapted reward function.
Despite the growing interest in sustainable finance, there are limitations in recent studies on the effectiveness of integrating clean (low-carbon) assets into portfolios composed of non-sustainable assets to improve their portfolio performance. In fact, most studies have focused on green portfolios compared to conventional portfolios, but few have explicitly examined the marginal contribution of clean assets to risk reduction and return improvement when combined with traditionally high-emitting assets, while accounting for fiduciary responsibilities to deliver expected returns.
This article aims to bridge this gap by investigating the impact of integrating sustainable financial assets (e.g., renewable energy, green bonds) into portfolio optimization using several Deep Reinforcing Learning optimization models (including TD3 and DDPG combined with LSTM and CNN feature extractors), based on risk and transaction cost-sensitive in portfolio optimization against the traditional Mean-Variance model. The analysis evaluates the added value of this diversification in terms of portfolio performance, particularly in turbulent market conditions, addressing recent criticisms of their profitability while aligning with fiduciary obligations. Furthermore, the study tests the optimization models across a range of risk aversion scenarios for different types of investors, from risk-seeking to highly risk-averse, by varying the risk aversion coefficient (λ). Finally, the analysis compares the effectiveness of different Deep Reinforcement Learning (DRL) optimization models to assess the robustness of the results in a non-stationary market environment.
This article offers several key contributions to the literature on sustainable finance and portfolio optimization. First, to the author’s knowledge, this is the first study to combine artificial intelligence and sustainability in a portfolio optimization framework, offering new perspectives on the benefits of clean asset diversification opportunities. This paper provides insights into the portfolio allocation and interaction between sustainable and non-sustainable assets in terms of adjusted risk and return dynamics during times of high turbulence, thus incentivizing the role of sustainable investments in modern portfolios while addressing fiduciary concerns. Furthermore, this paper contributes to overcoming the limitations of traditional Markowitz optimization by applying the novel Deep Reinforcement Learning (DRL) method, thus providing a robust framework that adapts to different investor risk aversion profiles (λ). Moreover, it offers a comparative analysis between the Deep Reinforcement Learning (DRL) agent architectures (LSTM and CNN-based) and their capacity to dynamically learn allocation strategies over the classical Mean-Variance model, in turbulent market environments.
The findings of this article have several implications. First of all, for investors and portfolio managers, this research can guide investment decisions by demonstrating whether clean assets can improve portfolio performance without breaching fiduciary responsibilities to prioritize financial returns. Thus, the integration of clean assets could encourage regulatory incentives favoring sustainable finance and accelerate the transition to a low-carbon economy. This study paves the way for further research on AI-based adaptive portfolio optimization methods and their applications in sustainability-focused investment strategies.
The rest of this article is organized as follows: Section 2 presents the literature review. Section 3 details the methodology. Section 4 analyzes the Data. Section 5 for results. Section 6 is the discussion, and Section 7 concludes the main contributions of this research.

2. Literature Review

In recent years, the literature on green finance has attracted increasing attention and interest. Despite the growing popularity of this topic, exploration of its potential implications for portfolio optimization remains limited. This section reviews sustainable finance in portfolio optimization and the different applications of artificial intelligence in portfolio optimization. It identifies gaps in existing research and proposes hypotheses to guide the investigation of dynamic portfolio strategies using Deep Reinforcement Learning (DRL).

2.1. Sustainable Assets and Diversification

Empirical studies show that sustainable assets, such as green bonds and renewable energy indices, offer different diversification benefits. For instance, Miralles-Quirós et al. (2019) show that exchange-traded funds (ETFs) focused on renewable energy exhibit lower correlations with fossil fuel assets, thus reducing the overall portfolio volatility. In a similar vein, Kuang (2021) demonstrates that green bonds maintain their hedging role during economic downturns. Mezghani et al. (2024, 2025) extend this argument by demonstrating that during periods of high turbulence (e.g., the COVID-19 pandemic, Russian-Ukrainian conflict), portfolios that incorporate green bonds achieve lower risk compared to those fully composed of high-emission assets. Furthermore, Kuang (2021) shows that the renewable energy sector offers the best risk-adjusted returns compared to other polluting sectors. Beyond diversification, green assets have been studied for their ability to serve as a hedge or safe haven, particularly during periods of heightened uncertainty. The COVID-19 pandemic has been a real-life testing ground. Yousaf et al. (2022) demonstrate that green bonds were the only asset class, among a wide range of alternative investments, to serve as a safe haven against sharp stock market fluctuations during the health crisis. Díaz et al. (2022), using copula modeling, also confirm the diversifying role of socially responsible investments during the pandemic. Saeed et al. (2020) demonstrate that clean energy stocks and green bonds have more effective hedging than non-sustainable energy assets. These diversification benefits directly address fiduciary concerns, as sustainable investments can enhance long-term portfolio resilience without sacrificing returns. Naqvi et al. (2022) and Kölbel et al. (2020) argue, via ESG impact theory, that sustainable assets mitigate environmental risks and attract ESG-conscious capital, aligning with fiduciary duties. Conversely, Edmans (2023) and Bolton and Kacperczyk (2023) caution that green assets may yield marginal returns, while Raghunandan and Rajgopal (2022) highlight greenwashing risks. Yet, the literature lacks evidence on how sustainable assets perform in dynamic portfolios under varying risk aversion, particularly when optimized with AI-driven methods, leaving a gap in reconciling diversification benefits with fiduciary obligations.

2.2. Artificial Intelligence and Portfolio Optimization

Modern portfolio management is theoretically based on Markowitz’s theory (Markowitz, 1952), which proposes a mean-variance optimization framework. However, this model is very sensitive to different estimation errors of its input parameters (expected returns and covariance matrix). To overcome these limitations, and with the emergence of new technologies, research has turned to artificial intelligence, and more specifically to machine learning (ML) and deep learning (DL) models. These new techniques are used to ameliorate the quality of predictions that are then used in the optimization models. For example, López de Prado et al. (2025) employ the forecasted returns based on machine learning models in order to improve the robustness of the Markowitz model results. In a similar vein, several studies have developed different predictive models, based on machine learning frameworks, such as long short-term memory (LSTM) or random forest, to predict future asset returns. These predictions are then used as input data in a mean-variance optimization model. Ma et al. (2021) and Martínez-Barbero et al. (2025) show that portfolios constructed from predicted returns outperform the standard benchmark, even during periods of high turbulence and low market. Similarly, Ta et al. (2020) confirm that LSTM-driven optimization yields better Sharpe ratios than naïve approaches. However, Sarkar et al. (2025) highlight the limitations of these techniques; they agree that minimizing the prediction error does not systematically guarantee a reduction in the final decision error. This result advances to the exploration of deep reinforcement learning (DRL) techniques, which explicitly model sequential decision-making under uncertainty. This new approach is used in portfolio management differently from machine learning and deep learning optimization because it learns to act rather than predict. The deep reinforcement learning (DRL) control problem involves an agent learning an investment policy and interacting directly with the market environment (Choudhary et al., 2025).
The application of Deep reinforcement learning in finance has grown in recent years. For instance, Machine learning (Q-learning)-based algorithms have been used to determine optimal allocations (Pigorsch & Schäfer, 2022). Additionally, policy gradient-based models, such as the Deep Deterministic Policy Gradient (DDPG), have been used to maximize returns while achieving diversification objectives (Lin et al., 2025). This method handles continuous action spaces and dynamically rebalances large portfolios with transaction cost constraints. Lin et al. (2025) took this logic further by proposing recurrent DRL architectures for real-time portfolio management that respect predefined risk tolerance constraints. In a similar vein, Proximal Policy Optimization (PPO) and Actor–Critic algorithms (e.g., A2C, A3C) have been used for trading and asset allocation. Yang et al. (2020) show that a Proximal Policy Optimization achieves superior Sharpe ratios compared to traditional benchmarks across equities and bonds. More recent innovations incorporate the Soft Actor Critic (SAC) method, which includes regularization to encourage exploration. Using this methodology, H. Liu et al. (2021) detect lower volatility and higher cumulative returns for multi-asset allocation than Markowitz and Black Litterman optimizations. These results suggest that deep reinforcement learning does not replicate existing strategies but offers a dynamic learning process that adapts to market regimes for each asset class. While Markowitz assumes static return distributions, DRL continuously learns from market feedback, making it robust in environments marked by structural changes, such as the transition to low-carbon investments.

2.3. Research Gaps and Hypotheses

The literature reveals two critical gaps. First, no research has explored the application of DRL to optimize portfolios combining sustainable and non-sustainable assets, particularly in turbulent markets, despite the proven diversification benefits of sustainable assets (Miralles-Quirós et al., 2019; Kuang, 2021; Mezghani et al., 2025). Second, there is limited evidence on how dynamic optimization can balance sustainable goals with fiduciary responsibilities across varying risk aversion levels (Raghunandan & Rajgopal, 2022). This study pioneers the use of a risk- and transaction-cost-sensitive DRL framework to optimize mixed portfolios, addressing these gaps by testing their performance in volatile conditions.
Based on the literature, the following hypotheses are proposed:
H1. 
Integrating sustainable assets into non-sustainable portfolios enhances performance and reduces risk.
H2. 
Risk- and transaction cost–sensitive Deep Reinforcement Learning models outperform mean–variance optimization.
H3. 
The benefits of sustainable asset integration hold across different risk aversion profiles.

3. Methodology

This section outlines the methodological framework employed to investigate the role of incorporating sustainable assets into traditional non-sustainable portfolios. The analysis is conducted using a risk-sensitive Deep Reinforcement Learning inspired by the approaches of (Jiang et al., 2024; Choudhary et al., 2025). While traditional mean-variance optimization forms the foundation of portfolio selection, it relies on restrictive assumptions that often fail during times of market turbulence and high volatility. Recent machine learning methods improve forecasts, but they remain static and do not adequately capture the investor preferences or transaction costs. Deep reinforcement learning overcomes these limitations by dynamically incorporating risk aversion and transaction costs into the learning process, making it suitable for analyzing the benefits of sustainable asset diversification. Moreover, recent studies (D. Zhang et al., 2020; Martínez-Barbero et al., 2025) confirm that DRL consistently outperforms static optimization methods in financial applications.

3.1. Asset Allocation Problem Under Portfolio Constraints

A discrete-time financial market comprising N risky assets over an investment horizon T is considered. The investor dynamically allocates capital by adjusting portfolio weights in response to observed market dynamics. Pt = [p1t, p2t,…, pnt]T  R + N is the vector of asset prices at time t and p n , t denotes the closing price of the n-th asset at time t. Similarly, m n , t Z + represents the number of assets held for the n -th asset, and P t R denotes the total portfolio value at time t .
During each period, investors execute portfolio trading strategies involving buying, selling, or holding assets, which correspond to an increase, decrease, or no change in the number of shares held for each asset, respectively. In this context, the portfolio selection decision variable is referred to as the trading action, denoted by k n , t , which represents the trading decision for the n -th asset at time t . For each asset, this variable can take one of three values: 1,0 , 1 , indicating selling, holding, and buying, respectively. This framework can be generalized to allow multiple shares of each asset to be traded, with k n , t taking values from the domain k , , 1 , 0 , 1 , , k , where k and k represent the maximum number of shares a trader can sell or buy in a period. To ensure constraints on trading volume, traders impose a maximum allowable number of shares, k k m a x . As a result, the number of shares held for the n -th asset at time t + 1 is updated as: m n , t + 1 = m n , t + k n , t . The portfolio value at time t evolves according to:
P t =   P t 1 +   P t T k t
where P t T is the transpose of the price vector at time t , and k t is the vector of trading actions for all assets.

3.2. Incorporating Transaction Costs

To reflect the real-time trading, the model incorporates proportional transaction costs, as in Yang et al. (2020). Let ξ ∈ R + denote the constant transaction cost rate applied across all assets. The transaction cost at time t is then formulated as:
c t t r a n =   ξ   · P t T k t
where | k t | represents the vector of absolute values of the trading actions, capturing both buying and selling activities, and P t T is the transpose of the price vector at time t.
Although the transaction cost constant ξ is assumed to be uniform across all assets, this assumption can be extended to account for varying transaction costs and liquidity differences between assets.

3.3. Risk Consideration and Risk Aversion Modeling

A central component of portfolio optimization is risk sensitivity, which significantly influences the objective function. Two primary sources of risk are addressed: the intrinsic risk of the portfolio position, captured by the portfolio variance σ t 2 , and the investor risk aversion coefficient λ.
The investor’s risk aversion coefficient λ penalizes portfolio volatility in the reward function and reflects the degree of risk tolerance. This parameter should not be confused with the CAPM beta (β), which measures the systematic risk of an asset relative to the market.
The portfolio variance is approximated using the sample variance of historical portfolio returns, expressed as:
σ t 2 =   1 t   i = 1 t P i P i 1 P i 1   μ t 2
where μ t is the mean return of the portfolio over the past t periods.
The literature offers various approaches to integrate the risk aversion coefficient into portfolio optimization models. A common and widely accepted method in financial economics is the use of utility functions to represent investor preferences. Popular utility functions include those with constant absolute risk aversion (CARA) and constant relative risk aversion (CRRA) (Chambers & Quiggin, 2007). The present framework adopts the Markowitz framework, introducing λ as a parameter that quantifies the investor’s degree of risk aversion.
To model the investor’s attitude toward portfolio risk, a risk cost function is incorporated into the objective function, defined as:
c t r i s k =   λ   ·   σ t 2
This term ensures that the optimization not only seeks performance maximization but also aligns with the investor’s risk profile, penalizing excessive volatility.

3.4. Market Environment as a Markov Decision Process (MDP)

The portfolio optimization problem is framed as a Markov Decision Process, which provides a structure for sequential decision-making under uncertainty (Bäuerle & Rieder, 2011). A Markov Decision Process is defined by a tuple (State space, Action space, Transition probability, Reward function, and Discount factor).
The State Space (st) captures at each time all information available to the agent. To detect temporal patterns, a lookback window of the last 60 trading days of historical returns is used for each asset. The state is, thus, a matrix of size N × 60, where N is the number of assets. The 60-day horizon captures roughly three months of trading activity, providing sufficient history to detect trends while remaining responsive to market changes (Ma et al., 2021). The state space is defined as:
s t =   P t , m t 1   S
where
P t is the price vector of assets at time t ,
m t 1 is the vector of asset holdings at time t 1 .
The Action Space (at) is the portfolio weight vector wt = [w1,t, w2,t, …, wN,t], with the following constraints applied at each decision step: Σwi,t = 1 and wi,t ≥ 0 for all i.
This formulation reflects a long-only, fully invested portfolio with no short selling. To enforce these constraints, the final layer of the actor network uses a softmax activation, which naturally outputs a valid weight vector.
The reward function r t s t , a t , s t + 1 captures the performance of the portfolio after executing the trading action a t at state s t and transitioning to the new state s t + 1 . It incorporates:
Portfolio Value: The immediate change in the portfolio’s value, denoted as P t T m t .
Risk Cost: A penalty term proportional to the portfolio’s volatility σ t 2 and the investor’s risk aversion coefficient λ.
Transaction Cost: A penalty term representing transaction costs, proportional to the trading volume K t and the transaction cost rate ξ set to 0.0005.
r t =   P t T m t   λ σ t 2     ξ P t T K t
The transition probability P(st+1 | st, at) is determined by the market and unknown to the agent, making this a model-free RL setting. The discount factor γ is set to 0.99, prioritizing long-term performance over short-term gains.

3.5. Deep Reinforcement Learning Agent Architecture

Several DRL agents are implemented based on the Actor–Critic framework, where the Actor proposes actions (portfolio weights) and the Critic evaluates them. The algorithms are based on two frameworks, the TD3 and DDPG. The TD3 (Twin Delayed Deep Deterministic Policy Gradient): uses two critic networks and delayed policy updates to address overestimation bias and improve stability (Fujimoto et al., 2018). While the DDPG (Deep Deterministic Policy Gradient) is a standard continuous-action actor–critic algorithm (Lillicrap et al., 2015). To process the 60-day return window, the framework compares two types of feature extractors the LSTM (Long Short-Term Memory), which captures the long-term temporal dependencies in financial data (Ma et al., 2021), and the CNN (Convolutional Neural Network), which captures local spatial patterns in the time series.
These components yield four agent variants (RTC-LSTM-TD3, RTC-LSTM-DDPG, RTC-CNN-TD3, and RTC-CNN-DDPG).
The architecture follows Jiang et al. (2024): the feature extractor feeds into actor and critic networks to produce a continuous action (weight vector).

3.6. Portfolio Performance

The empirical application is structured into two stages: training and testing. The training phase employs daily closing prices from 2 January 2018, until 10 October 2023, while the testing phase assesses performance on data from 11 October 2023, to 28 March 2025. To ensure robust evaluation, an intermediate validation step is conducted, leveraging in-sample data to assess the quality and efficacy of the trained portfolio model. Following this validation, the finalized model is applied to the test dataset, simulating trading performance in real-world conditions.
This rigorous two-stage framework ensures that the portfolio construction methodology and the learning agent’s performance are evaluated comprehensively, highlighting the adaptability and effectiveness of the proposed approach in dynamic financial markets.
Portfolio performance is evaluated by comparing different metrics among strategies. One of the primary metrics is the cumulative portfolio value, which quantifies the growth in portfolio wealth over the investment period. Specifically, the cumulative portfolio value P T is calculated as the net increment in the portfolio’s value across time t
P T n e t =   P 0 +   t = 1 T P t T   k t   ε k t  
where P 0 represents the initial portfolio wealth, set to P 0 = 1 , and T denotes the total duration of the investment period. The transaction cost rate ξ is explicitly included in this calculation to reflect the costs incurred during trading. The cumulative return S T at the end of the investment period T is then defined as follows:
S T = p T n e t p 0 p 0
where P T n e t is the net portfolio value at the end of the period, and P 0 is the initial portfolio wealth. While cumulative return measures overall gains, it does not account for the risk embedded in the portfolio. To complement this, the Sharpe ratio is a widely used metric that evaluates portfolio returns relative to the associated risk, providing a risk-adjusted performance measure. It is expressed as:
S R = E p t   p f σ p t
where p t represents the portfolio return p t = p t p t 1 p t 1 , p f is the risk-free return, and σ p t is the unconditional volatility of p t , calculated as σ p t = V a r p t .
Another critical metric for evaluating portfolio performance is the Maximum Drawdown (MDD), which quantifies the maximum potential loss from a portfolio’s peak to its lowest point, defined as:
M D D =   max t t P t n e t   P t n e t   P t n e t
To provide a combined perspective on risk and return, the Calmar ratio (CR) is introduced. This metric incorporates the MDD to deliver a comprehensive evaluation of the portfolio’s risk-adjusted performance. The Calmar ratio is defined as:
C R =   P T n e t M D D
The Calmar ratio helps investors align portfolio strategies with their risk tolerance and investment goals.

4. Data and Preliminary Analysis

4.1. Data

The dataset comprises daily prices for a set of non-sustainable assets (Natural Gas, Brent Crude Oil, S&P 500, NASDAQ, Treasury Bond ETF) and sustainable assets (Wind, Water, Solar, First Trust NASDAQ Clean Energy, S&P Green Bond Index, S&P GSCI Precious Metals) over the period from 2 January 2018 until 4 April 2025. This time frame encompasses several market distress periods, including the COVID-19 pandemic, the Russia–Ukraine war, and the recent geopolitical tensions in the Middle East. This period provides a challenge to test portfolio strategies. All data were extracted from Refinitiv Datastream to ensure consistency and reliability. As financial markets operate five days per week, the dataset excludes weekends. Occasional missing observations due to market holidays were forward-filled to maintain a continuous daily frequency.
For non-sustainable assets, Natural Gas and Brent Crude Oil are representative of traditional energy commodities with high carbon intensity; they are used as benchmarks for global energy prices (D. Zhang et al., 2020). Furthermore, the S&P 500 and NASDAQ indices represent broad market exposure to developed-market equities, which contain significant shares of high-emission industries (H. Liu et al., 2021). The Treasury Bond ETF represents fixed-income securities considered as safe havens, but with no explicit sustainability focus.
On the other hand, Wind, Water, and Solar indices represent renewable energy markets that correspond with sustainable finance objectives (Miralles-Quirós et al., 2019). Furthermore, the First Trust NASDAQ Clean Energy ETF tracks firms involved in the production or development of clean energy. The S&P Green Bond Index includes bonds issued to finance environmental projects (Kuang, 2021). The S&P GSCI Precious Metals index is an alternative investment with historically low correlations to energy markets (Baur & Lucey, 2010).
Figure 1 illustrates the price performance of sustainable and non-sustainable assets over the study period. Sustainable indices, such as wind and solar, as well as First Trust’s NASDAQ Clean Energy, exhibit high volatility and marked growth phases, coinciding with the economic stimulus measures implemented just after the COVID-19 pandemic slowed. Non-sustainable commodities, such as Brent crude and natural gas, reflect cyclical supply and demand shocks, as well as geopolitical risk premiums.
A very important observation is that the prices of all assets declined by the end of 2024, particularly those of sustainable assets. This period corresponds to the Trump presidency. It is possible that during this period, investors are no longer seeking clean energy but rather profitability.

4.2. Preliminary Analysis

Table 1 illustrates the descriptive statistics for the daily returns of all assets. All return series were computed as long differences in daily prices. Most assets exhibit positive mean returns, while the S&P Green Bond Index shows a negative mean. Volatility, measured by the standard deviation of returns, shows that S&P GSCI Precious Metals’ least volatile market, followed by S&P Green Bond index. Skewness values show that only natural gas and Treasury bond ETFs exhibit positive skewness, and all other indices, mainly sustainable indices, exhibit negative skewness, consistent with crash dynamics in times of political uncertainty (Karkowska & Urjasz, 2023). Kurtosis values significantly exceed three for all series, highlighting fat-tailed distributions, a well-documented feature of financial returns that motivates the use of robust optimization methods (Cont, 2001). The results of the Jarque–Bera test contradict a normal distribution. The results of the Augmented Dickey–Fuller (ADF) tests show that almost all series are stationary at the 1% significance level.
The violin plots in Figure 2 report a visual representation of the distribution of daily returns for each asset over the sample period. The violin plots combine a boxplot with a kernel density estimate, in order to assimilate the shape, symmetry, and tails of the return distributions (Hintze & Nelson, 1998).
The majority of assets exhibit return distributions that are not perfectly symmetric. For instance, the violin plots for Wind and Solar indices show lower tails, indicating that these assets have experienced more negative returns compared to their positive returns. This asymmetry aligns with findings in renewable energy markets, where periods of high turbulence can cause sharp price declines (Karkowska & Urjasz, 2023).
The width of each violin reflects the density of returns at various levels. A wider spread of the violin, especially near the center, indicates higher variability around the median. Commodities such as Brent Crude Oil and Natural Gas display broader distributions, which is consistent with the higher standard deviations detected and with prior research that proves that energy commodities exhibit significant volatility (Ustaoglu et al., 2021). Equity indices (S&P 500, NASDAQ) have narrower, more centralized volatilities, reflecting relatively more stable return distributions during the same period.
Figure 3 presents the heatmap of pairwise Pearson correlations across all assets. Several insights emerge. Non-sustainable assets exhibit strong positive correlations with each other (e.g., S&P 500 and NASDAQ > 0.9), reflecting their shared exposure to market fluctuations (H. Liu et al., 2021). Furthermore, sustainable assets display moderate to low correlations with non-sustainable assets, particularly Wind and Solar indices, which show correlations below 0.3 with non-sustainable assets. This confirms the potential for diversification illustrated in prior studies (Miralles-Quirós et al., 2019; Saeed et al., 2020).
Interestingly, the S&P GSCI Precious Metals shows low correlations with both sustainable and non-sustainable assets, validating its role as a safe haven (Baur & Lucey, 2010). The S&P Green Bond Index maintains weak correlations with non-sustainable assets but slightly moderate links with Treasury bonds, consistent with its fixed income nature (Kuang, 2021). These correlations are essential for portfolio construction. In mean-variance optimization, diversification benefits arise when assets are imperfectly correlated (Markowitz, 1952). In a dynamic deep reinforcement learning framework, low correlations expand the state space of potential hedging actions, enabling the agent to exploit the non-correlation opportunities in real time.

5. Results

This section outlines the backtesting results of portfolio optimization performance for two portfolio types: a portfolio of non-sustainable assets and a mixed portfolio combining non-sustainable and sustainable assets. First, the analysis begins with comparing 4 portfolio optimization strategies from DRL (RTC-LSTM-TD3, RTC-LSTM-DDPG, RTC-CNN-TD3, and RTC-CNN-DDPG), comparing with the traditional Mean-Variance model for a risk aversion coefficient of λ = 0.005 and a transaction cost rate of ξ = 0.0005. This optimization scenario (λ = 0.005; ξ = 0.0005) is considered as a reference because the penalization of risk and transaction costs is minimal, allowing for analysis of the intrinsic performance of each strategy without strong constraints (Jiang et al., 2024).
Table 2 (Panel A) and Figure 4 report the results for the portfolio composed of non-sustainable assets. Results show that the Deep Reinforcement Learning optimization models outperform classical MV optimization. This result is in line with the findings of Jiang et al. (2024) and H. Liu et al. (2021), which show that DRL models like TD3 and PPO outperform classical benchmarks. The RTC-LSTM-TD3 records the highest performance across all the other portfolio optimizations. For instance, it achieves an annual return of around 8.73% and a cumulative return of 12.99%, outperforming the other deep reinforcement learning methods (DRL) such as RTC-LSTM-DDPG, which records an annual return of 5.85%. It also exhibits the highest Sharpe ratio of 0.67 and a Calmar ratio of 1.22, surpassing the second-best optimization model, the RTC-CNN-TD3, with a Sharpe ratio of 0.48 and 0.89 as a Calmar ratio. These results indicate that the combination of LSTM and TD3 networks improves the optimization model’s ability to better capture long-term time dependencies and adapt to market return dynamics (Ma et al., 2021; Ta et al., 2020). TD3’s double-critical and delayed policy updates reduce overestimation bias, leading to more stable and profitable portfolio decisions (Aboussalah & Lee, 2020).
For the same scenario (λ = 0.005, ξ = 0.0005) and applied for the mixed portfolio sustainable and non-sustainable assets (Table 2 (Panel B) and Figure 5), results show that RTC-LSTM-TD3 is the best optimization method, the results show an annual return of 24.18% and a cumulative return of 37.31%, outperforming the optimization method by RTC-LSTM-DDPG (16.18% annual; 24.57% cumulative) as well as RTC-CNN-TD3 (18.23% annual; 27.79% cumulative). The RTC-LSTM-TD3 displays the best optimization performance measures in terms of risk-adjusted indicators, with a Sharpe ratio of 2.91 and a Calmar ratio of 5.69, thus outperforming the mean-variance (MV) benchmark, which only achieves a Sharpe ratio of 0.24 and a Calmar ratio of 0.22. These results confirm that reinforcement learning models, especially RTC-LSTM-TD3, perform better and more efficiently than traditional MV optimization in turbulent markets, as also reported by Song et al. (2023) and Martínez-Barbero et al. (2025).
Comparing the performance measures of the non-sustainable assets portfolio and mixed portfolio of sustainable and non-sustainable assets in Table 2 (Panel A and B), findings show that incorporating sustainable assets positively enhances portfolio performance compared to portfolios that include non-sustainable assets. For RTC-LSTM-TD3, the annual volatility decreases from 13.03% (non-sustainable assets) to only 8.29% (mixed portfolio), while on the other hand, the cumulative returns show a rise from 12.99% to 37.31%. Similar improvements are observed across other strategies: for instance, RTC-CNN-TD3’s volatility falls from 14.19% to 8.35% with a jump in cumulative return from 10.07% to 27.79%.
In terms of the Sharpe Ratio, RTC-LSTM-TD3 improves from 0.67 (non-sustainable) to 2.91 (mixed), indicating a better risk-adjusted performance. Similarly, the Calmar Ratio increases substantially from 1.23 to 5.69, confirming stronger performance relative to drawdown risk. Across all strategies, Sharpe and Calmar ratios are consistently higher in the mixed portfolios, suggesting improved efficiency and stability. Moreover, the maximum drawdown for RTC-LSTM-TD3 is also notably lower in the mixed portfolio (−4.25%) compared to the non-sustainable one (−7.12%), indicating better resilience during market downturn. This pattern is repeated for other methods, such as RTC-CNN-TD3, where drawdown decreases from −7.61% to −4.91%, with Sharpe ratio improving from 0.48 to 2.18.
These findings demonstrate that integrating sustainable assets into traditional non-sustainable portfolios not only aligns with sustainability objectives but also fulfills fiduciary responsibilities by delivering superior risk-adjusted returns. Contrary to criticisms that green investments yield marginal returns (Yousaf et al., 2022; Karkowska & Urjasz, 2023)
Figure 6 compares the cumulative return performance of non-sustainable portfolio assets trend versus Non-Sustainable and Sustainable portfolio assets for the different portfolio optimization strategies. For all optimization strategies, the mixed portfolio with sustainable and non-sustainable portfolio curves above that of the non-sustainable portfolio while exhibiting a smoother trajectory with minimal variability. For instance, for RTC-LSTM-TD3, the mixed portfolio (Sustainable and non-sustainable assets) not only outperforms in terms of cumulative return but also shows less pronounced drawdowns, confirming that adding sustainable assets absorbs the shock and minimizes volatility. The resilience of mixed portfolios (Sustainable and non-sustainable assets) increased compared to the non-sustainable assets portfolio during the market disruptions observed in late 2024, a period characterized by heightened uncertainty related to geopolitical tensions in the Middle East and the US election. Portfolios combined from non-sustainable assets exhibited pronounced fluctuations and declines during this period; on the other hand, the mixed portfolio of sustainable and non-sustainable assets portfolio maintained stable trajectories. This finding confirms the ability of sustainable assets to absorb shocks under periods of high turbulence and distress. These results highlight the diversification benefits of low-carbon assets. In fact, green assets generally have low or negative correlations with traditional assets, which minimizes portfolio risk and subsequently strengthens resilience to shocks (Miralles-Quirós et al., 2019; Kuang, 2021). Saeed et al. (2020) also show that renewable energy stocks act as hedges in highly turbulent markets, thus improving risk-adjusted returns. Furthermore, sustainable assets act as stabilizers, reducing exposure to transition risks and regulatory uncertainties linked to carbon-intensive industries (Kölbel et al., 2020). They also make portfolios more attractive to ESG-conscious investors, creating a virtuous circle of capital flows into sustainable finance.
In general, results show that the integration of sustainable assets helps to reduce excessive volatility and subsequently improves financial performance. These findings directly address concerns about the marginal profitability of green investments by demonstrating that sustainable assets can significantly enhance portfolio returns while reducing risk, thereby aligning with the fiduciary responsibilities of portfolio managers to prioritize competitive returns for investors. Furthermore, these results promote the allocation of capital and assets towards more sustainable sectors, thus contributing to long-term growth and risk minimization. Moreover, these results highlight that sustainable investments and sustainable assets can be profitable while supporting climate objectives (Naqvi et al., 2022).
To conclude, under the reference scenario (λ = 0.005, ξ = 0.0005), where risk and transaction costs penalties are minimal, RTC LSTM TD3 outperforms the other reinforcement learning methods and the MV in both non-sustainable and mixed with sustainable portfolios. The integration of sustainable assets improves returns, reduces volatility, and enhances the overall performance of all portfolio optimization strategies. These findings highlight the advantages and benefits of sustainability in improving the financial efficiency (Miralles-Quirós et al., 2019; Naqvi et al., 2022). This evidence in support of H1 and H2, showing that sustainable assets improve resilience and risk-adjusted returns in volatile conditions, and the risk-and transaction cost–sensitive Deep Reinforcement Learning models outperform mean–variance optimization. However, these results are based on low risk aversion, so are these results the same if the investor is not risk-averse?

5.1. Portfolio Performance Under Different Risk Aversions (Variation of λ)

The previous results were obtained under the assumption of low risk aversion (λ = 0.005). A crucial question is whether these findings hold for investors with different risk profiles. Accordingly, the performance of the strategies is investigated under varying levels of risk aversion. The analysis focuses primarily on the RTC-LSTM-TD3, which exhibits the highest performance.
Table 3 (Panel A) presents the performance metrics of the non-sustainable assets’ portfolio optimized by the RTC-LSTM-TD3 agent across four levels of risk aversion, from low (λ = 0.005) to very high (λ = 0.05). When λ increases, the agent is more sensitive to risk in its reward function, penalizing volatility more strongly. Results reveal a progressive decline in returns as risk aversion rises. The annual return falls from 8.73% at λ = 0.005 to 4.04% at λ = 0.05, while the cumulative return decreases from 12.99% to 5.97%. At the same time, volatility is reduced from 13.03% to 7.20%. This behavior is consistent with the theoretical expectation that a higher λ leads the agent to allocate more weight to safer, low-variance assets (Brini & Tantari, 2023; Song et al., 2023).
The Sharpe ratio decreases moderately (0.67 → 0.56). The Calmar ratio slightly decreases (1.22 → 0.86) as the maximum drawdown decreases from −7.12% to −4.64%. This suggests that while risk is reduced, the proportional improvement in return diminishes under higher risk aversion. This result shows that a risk-averse agent manages to better protect their capital but sacrifices a significant portion of their return potential (Ma et al., 2021; Ta et al., 2020). Results are in line with Jiang et al. (2024) findings that an increase in the risk aversion coefficient λ is accompanied by a decrease in the Annual Volatility of portfolios.
Similar results were observed for the mixed portfolio (sustainable and non-sustainable assets) in Table 3 (Panel B). When λ reaches its maximum (0.05), the agent becomes very conservative: the annual return decreases from (24.17%; λ = 0.005) to 9.07%, also the cumulative return drops to 13.56%, but the annual volatility drops drastically to 3.74% and the drawdown to −2.12%. It is noteworthy that the Sharpe ratio remains very high (2.42) and the Calmar ratio remains solid (5.69), indicating that the mixed portfolio preserves a high return per unit of risk, even under strict risk constraints. This evidence validates H3, since the benefits of sustainable assets persist across different risk profiles.
The contribution of sustainable assets is revealed by directly comparing the performance of the two portfolios across all risk profiles. Figure 7 illustrates the cumulative return trends of the non-sustainable and mixed (“sustainable + non-sustainable) portfolios across different risk profiles. It can be seen that for all risk aversion levels tested (λ = 0.015, λ = 0.03, and λ = 0.05), the mixed portfolio significantly outperforms the non-sustainable portfolio in terms of cumulative return. In each scenario, the mixed portfolio generates a higher final cumulative return with a performance gap between the two portfolios (Non-sustainable and Mixed), which increases over time. The two portfolios perform similarly during the initial phase (late 2023 to early 2024), but a clear divergence appears around mid-2024, which corresponds with the periods of high turbulence and geopolitical risks. From that point on, the mixed portfolio establishes a strong upward trend, while the non-sustainable portfolio struggles to make progress, exhibiting high volatility without sustained growth or even periods of decline.
The trajectory involving portfolio optimization with mixed assets is visibly more stable. It shows greater resilience during periods of market declines, with less severe and shorter-term declines than the performance of the “non-sustainable” portfolio. This justification for lower volatility corroborates the quantitative findings and highlights the benefits of including clean and sustainable assets in terms of risk reduction. The non-sustainable portfolio is more vulnerable to market shocks. The reduction in risk aversion parameter λ from 0.015 to 0.05 affects both portfolios to adopt a more conservative attitude, resulting in lower overall cumulative returns. At λ = 0.05, the cumulative return of the non-sustainable portfolio reaches only 6%, while the mixed portfolio achieves a stronger return of over 13%. This result demonstrates that the mixed portfolio with sustainable and non-sustainable assets offers more diversification opportunities that help to minimize overall risk.
These different graphs illustrate the powerful diversification benefits of sustainable assets in a dynamic DRL framework. Thanks to sustainable assets, the DRL agent is able to construct a better-performing portfolio, less sensitive to market turbulence, and capable of generating more consistent growth. This result is in line with the findings of Le et al. (2021) and Pham (2021), who demonstrate that green assets perform particularly well during bullish and bearish extremes, thanks to their time-frequency hedging properties.
Furthermore, the integration of sustainable assets is not only for environmental issues, but also a financially optimal strategy, offering superior risk-adjusted returns, regardless of the investor’s risk tolerance. These results corroborate the studies of Mezghani et al. (2024), which demonstrate that green bonds constitute effective diversification assets, serving as safe havens, particularly against equity and oil markets during turbulent periods such as the COVID-19 pandemic or the Russia-Ukraine conflict. Similarly, Han and Li (2022) show that the inclusion of green bonds in a portfolio allows for a significant reduction in availability and improved returns to risk. These results directly address fiduciary concerns by demonstrating that sustainable assets do not compromise returns, even under strict risk constraints, but instead enhance portfolio resilience and risk-adjusted performance. This contradicts claims of marginal profitability for green investments (Karkowska & Urjasz, 2023) and supports the notion that portfolio managers can integrate sustainable assets without breaching their fiduciary obligations. This directly confirms H3 by showing that sustainable integration remains effective for both risk-seeking and risk-averse investors.

5.2. Robustness Check of Risk Aversion Results: RTC-CNN-TD3

To assess the robustness of previous findings regarding the Risk aversion conclusion, the analysis is extended by testing risk aversion scenarios (λ = 0.005 to λ = 0.05) with another model and not limiting this study to only the RTC-LSTM-TD3 optimization model. The RTC-CNN-TD3 optimization model, identified as the second-best performing model in the initial analysis, is therefore employed.
Table 4 (Panel A and Panel B) presents the results for both the non-sustainable and mixed (Sustainable and non-sustainable assets) portfolios, respectively. For the non-sustainable portfolio (Table 4 (Panel A)), the RTC-CNN-TD3 agent exhibits similar adaptive behavior to that of the LSTM. As the risk aversion λ increases, the annual return decreases, while the annual volatility is significantly reduced from 14.20% to 8.17%. This confirms that the Deep Reinforcing Learning (DRL) optimization framework adjusts dynamically its strategy to meet the investor’s risk constraints, a behavior consistent with the findings of Brini and Tantari (2023) and Song et al. (2023). However, the overall portfolio performance is lower than that of the LSTM-based model, with Sharpe ratios stagnating around 0.35–0.48, reinforcing the idea that LSTMs are better at capturing the purely temporal dynamics of financial data (Ma et al., 2021).
For the mixed portfolio (Sustainable and non-sustainable assets) (Table 4 (Panel B)), RTC-CNN-TD3 shows higher Sharpe and Calmar ratios across all λ levels, reaching 2.33 at λ = 0.05 compared to the non-sustainable portfolio. These results support the argument that integrating sustainable assets enhances risk-adjusted returns. Although the cumulative return drops from 27.79% to 10.08%, the model still delivers consistent performance under risk constraints. The consistent outperformance of the mixed portfolio across different risk aversion levels demonstrates that sustainable assets can enhance portfolio returns and resilience, enabling portfolio managers to fulfill their fiduciary responsibilities while aligning with ESG objectives.
This robustness check confirms and supports the advantages of sustainable investments and sustainable assets for better diversification opportunities. While the LSTM-based agent achieves the highest performance, the CNN-based agent validates previous findings, thereby enhancing the credibility and generalizability of the main conclusions of this article.

6. Discussion

This study delivers several important insights into sustainable asset allocation and advanced portfolio optimization. First, the integration of sustainable assets into non-sustainable portfolios increases returns and reduces volatility. These findings confirm the diversification benefits of green assets highlighted by Miralles-Quirós et al. (2019) and Kuang (2021). Mezghani et al. (2024, 2025) also demonstrate that green bonds serve as effective hedges in turbulent markets. The present results extend this evidence by showing that sustainable assets improve risk-adjusted performance measures such as the Sharpe and Calmar ratios across a broad set of market conditions. This result supports H1.
Second, Deep Reinforcement Learning (DRL) models consistently outperform the mean–variance framework. This evidence supports the conclusions of Jiang et al. (2024) and Martínez-Barbero et al. (2025), who argue that DRL agents adapt better to changing market regimes than static optimization models. The results in this paper go further by demonstrating that DRL models explicitly designed to account for risk and transaction costs, such as RTC-LSTM-TD3, achieve superior performance in volatile environments. This result confirms H2.
Third, the benefits of sustainable assets persist across different levels of investor risk aversion. This finding contributes directly to the debate on fiduciary responsibility. Critics such as Edmans (2023) and Raghunandan and Rajgopal (2022) argue that green investments may deliver marginal or inconsistent returns. In contrast, the evidence presented here shows that sustainable assets strengthen portfolio resilience without compromising profitability. This outcome demonstrates that fiduciary duties and sustainable investing are not conflicting goals but rather complementary strategies for long-term value creation. This result validates H3.
In sum, the results position sustainable assets as both a financial and strategic resource: they enhance portfolio resilience, support fiduciary obligations, and accelerate the transition toward a low-carbon economy.

7. Conclusions

The transition to a low-carbon economy and growing financial market volatility pose a challenge for investors to align their portfolios with sustainability objectives and manage increased risks while fulfilling fiduciary responsibilities. Deep reinforcement learning (DRL) offers new perspectives for dynamic portfolio optimization.
This paper aims to investigate the role of integrating sustainable assets into a portfolio of non-sustainable assets using the newest Deep Reinforcement Learning (DRL) framework based on risk and transaction cost-sensitive (RTC) in portfolio optimization. The study relies on several Deep Reinforcing Learning models (including TD3 and DDPG combined with LSTM and CNN feature extractors) against the traditional Mean-Variance model. The analysis covers a turbulent period from January 2018 until April 2025, which marked by high volatility and multiple crises.
The study employs daily prices for a set of non-sustainable assets (Natural Gas, Brent Crude Oil, S&P 500, NASDAQ, Treasury Bond ETF) and sustainable assets (Wind, Water, Solar, First Trust NASDAQ Clean Energy, S&P Green Bond Index, S&P GSCI Precious Metals). The analysis is based on two portfolios: a non-sustainable assets portfolio and a mixed portfolio combined with non-sustainable and sustainable assets over different optimization strategies.
Results describe interesting findings. First of all, the study confirms that Deep Reinforcement Learning (DRL) models, and, in particular, the RTC-LSTM-TD3 optimization model, significantly outperform traditional Mean-Variance optimization, providing more robust and profitable strategies in turbulent markets. Furthermore, adding sustainable assets (renewable energy, green bonds) to a portfolio of non-sustainable assets (fossil fuels, traditional indices) enhances portfolio performance. For instance, under low risk aversion (λ = 0.005), the RTC-LSTM-TD3 optimization achieves an annual return of 8.73% with a Sharpe ratio of 0.76 in the non-sustainable assets’ portfolio, while the mixed portfolio (non-sustainable and sustainable assets) records an annual return of 24.18% with a Sharpe ratio of 2.91. Mixed portfolio (non-sustainable with sustainable assets) generates cumulative returns three times higher compared to the non-sustainable portfolios, while reducing the overall volatility by nearly 40%. This result highlights the diversification benefits of low-carbon assets. Portfolios combined from non-sustainable assets exhibited pronounced fluctuations and declines during the market disruptions observed in late 2024, a period characterized by heightened uncertainty related to geopolitical tensions in the Middle East and the US election; on the other hand, the sustainable and non-sustainable assets portfolio maintained stable trajectories. This result is in line with the findings of Hammoudeh et al. (2020), Reboredo et al. (2020), Reboredo and Ugolini (2020) and Mensi et al. (2022), which prove the ability of sustainable assets to absorb shocks under periods of high turbulence and distress.
Moreover, the analysis is extended by testing several risk aversion scenarios across four levels of risk aversion from low (λ = 0.005) to very high (λ = 0.05). When λ increases, the agent is more sensitive to risk in its reward function, penalizing volatility more strongly. Results exhibit a progressive decline in returns and volatility as risk aversion rises. Even under high risk aversion (λ = 0.05), the mixed portfolio outperforms the non-sustainable portfolio in terms of risk-adjusted measures, confirming that sustainable assets contribute to better Sharpe and Calmar ratios across investor profiles. These results are consistent with previous research (Brini & Tantari, 2023; Song et al., 2023; Jiang et al., 2024), which show that a higher level of risk λ leads the agent to allocate more weight to safer, low-variance assets.
In addition, a robustness check is conducted using the RTC-CNN-TD3 model. The stability of the findings is confirmed. Despite different model architectures, the integration of sustainable assets into non-sustainable assets consistently improves portfolio resilience and performance under various risk scenarios.
Furthermore, this article offers important insights into the current debate over fiduciary responsibility. Historically, trustees have been reluctant to allocate capital to green or sustainable assets, fearing the profit margin and the potential conflict with their commitment to maximizing risk-adjusted returns (Richardson, 2009; Sandberg, 2011). However, findings clearly demonstrate that the integration of sustainable assets improves both returns and risk-adjusted performance across different investor risk profiles. These results suggest that integrating sustainable assets does not conflict with fiduciary duty; on the contrary, it reinforces it by improving the long-term resilience of the portfolio and protecting it against significant financial risks. Therefore, sustainable asset allocation should not be interpreted as a trade-off between ethics and performance, but rather as a necessary step to meet fiduciary obligations in the current financial environment.
This research has several implications for investors, portfolio managers, policymakers, and regulators. For investors, the findings show that including sustainable assets with traditional non-sustainable assets in a portfolio allocation leads to better returns, lower volatility, and reduced drawdowns, even during periods of high turbulence. This evidence allows individual and institutional investors to confidently allocate their capital to sustainable assets without fear of compromising their performance. For portfolio managers, the results provide practical guidance on adopting new techniques based on deep reinforcement learning (RTC-DRL), taking into account risks and transaction costs. Indeed, thanks to models like RTC-LSTM-TD3, managers can implement adaptive allocation policies that dynamically react to market conditions and integrate sustainability objectives while maintaining competitive financial results, thereby fulfilling fiduciary duties. For policymakers and regulators, the findings of this research highlight the important role of sustainable finance and the importance of creating regulatory environments and incentives that support the integration of environmental, social, and governance (ESG) considerations into portfolio construction to foster wider adoption of sustainable finance practices, thereby contributing to broader economic and environmental objectives (Kölbel et al., 2020; Naqvi et al., 2022).
This study is limited by its focus on U.S. assets, such as the S&P 500, NASDAQ, and U.S. sustainable indices (e.g., First Trust NASDAQ Clean Energy, S&P Green Bond Index). This restricts the generalizability of the findings to other markets, which may exhibit distinct risk-return dynamics or regulatory environments.
Future research could enhance the generalizability of these findings by incorporating assets from diverse global markets, including emerging markets and region-specific sustainable assets (e.g., European green bonds or Asian renewable energy indices). This would allow for a broader assessment of the diversification benefits of sustainable assets across different economic and regulatory contexts.

Author Contributions

Conceptualization, F.B.H. and M.B.A.; methodology, F.B.H.; software, F.B.H.; validation, F.B.H. and M.B.A.; formal analysis, F.B.H. and M.B.A.; investigation, F.B.H.; resources, F.B.H.; data curation, F.B.H.; writing—original draft preparation, F.B.H. and M.B.A.; writing, F.B.H.; visualization, F.B.H.; supervision, M.B.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Aboussalah, A. M., & Lee, C. G. (2020). Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization. Expert Systems with Applications, 140, 112891. [Google Scholar] [CrossRef]
  2. Baur, D. G., & Lucey, B. M. (2010). Is gold a hedge or a safe haven? An analysis of stocks, bonds and gold. Financial Review, 45(2), 217–229. [Google Scholar] [CrossRef]
  3. Bäuerle, N., & Rieder, U. (2011). Markov decision processes with applications to finance. Springer Science & Business Media. [Google Scholar] [CrossRef]
  4. Ben Hamadou, F., Mezghani, T., & Boujelbène Abbes, M. (2025). Quantile-time-frequency risk spillover between investor attention, clean, and dirty cryptocurrency returns. Risk Management, 27(3), 10. [Google Scholar] [CrossRef]
  5. Ben Hamadou, F., Mezghani, T., Zouari, R., & Boujelbène-Abbes, M. (2024). Forecasting bitcoin returns using machine learning algorithms: Impact of investor sentiment. EuroMed Journal of Business, 20(1), 179–200. [Google Scholar] [CrossRef]
  6. Bolton, P., & Kacperczyk, M. (2023). Global pricing of carbon—Transition risk. The Journal of Finance, 78(6), 3677–3754. [Google Scholar] [CrossRef]
  7. Brini, A., & Tantari, D. (2023). Deep reinforcement trading with predictable returns. Physica A: Statistical Mechanics and its Applications, 622, 128901. [Google Scholar] [CrossRef]
  8. Chambers, R. G., & Quiggin, J. (2007). Dual approaches to the analysis of risk aversion. Economica, 74(294), 189–213. [Google Scholar] [CrossRef]
  9. Choudhary, H., Orra, A., Sahoo, K., & Thakur, M. (2025). Risk-adjusted deep reinforcement learning for portfolio optimization: A multi-reward approach. International Journal of Computational Intelligence Systems, 18(1), 126. [Google Scholar] [CrossRef]
  10. Cont, R. (2001). Empirical properties of asset returns: Stylized facts and statistical issues. Quantitative Finance, 1(2), 223. [Google Scholar] [CrossRef]
  11. Díaz, A., Esparcia, C., & López, R. (2022). The diversifying role of socially responsible investments during the COVID-19 crisis: A risk management and portfolio performance analysis. Economic Analysis and Policy, 75, 39–60. [Google Scholar] [CrossRef]
  12. Edmans, A. (2023). The end of ESG. Financial Management, 52(1), 3–17. [Google Scholar] [CrossRef]
  13. Elsayed, A. H., Naifar, N., Nasreen, S., & Tiwari, A. K. (2022). Dependence structure and dynamic connectedness between green bonds and financial markets: Fresh insights from time-frequency analysis before and during COVID-19 pandemic. Energy Economics, 107, 105842. [Google Scholar] [CrossRef]
  14. Fujimoto, S., Hoof, H., & Meger, D. (2018, July 10–15). Addressing function approximation error in actor-critic methods. International Conference on Machine Learning (pp. 1587–1596), Stockholm, Sweden. [Google Scholar]
  15. Hamadou, F. B., Mezghani, T., & Abbes, M. B. (2024). Time-varying nexus and causality in the quantile between Google investor sentiment and cryptocurrency returns. Blockchain: Research and Applications, 5(2), 100177. [Google Scholar] [CrossRef]
  16. Hammoudeh, S., Ajmi, A. N., & Mokni, K. (2020). Relationship between green bonds and financial and environmental variables: A novel time-varying causality. Energy Economics, 92, 104941. [Google Scholar] [CrossRef]
  17. Han, Y., & Li, J. (2022). Should investors include green bonds in their portfolios? Evidence for the USA and Europe. International Review of Financial Analysis, 80, 101998. [Google Scholar] [CrossRef]
  18. Hintze, J. L., & Nelson, R. D. (1998). Violin plots: A box plot-density trace synergism. The American Statistician, 52(2), 181–184. [Google Scholar] [CrossRef]
  19. Jiang, Y., Olmo, J., & Atwi, M. (2024). Deep reinforcement learning for portfolio selection. Global Finance Journal, 62, 101016. [Google Scholar] [CrossRef]
  20. Karkowska, R., & Urjasz, S. (2023). How does the Russian-Ukrainian war change connectedness and hedging opportunities? Comparison between dirty and clean energy markets versus global stock indices. Journal of International Financial Markets, Institutions and Money, 85, 101768. [Google Scholar] [CrossRef]
  21. Kölbel, J. F., Heeb, F., Paetzold, F., & Busch, T. (2020). Can sustainable investing save the world? Reviewing the mechanisms of investor impact. Organization & Environment, 33(4), 554–574. [Google Scholar] [CrossRef]
  22. Kuang, W. (2021). Which clean energy sectors are attractive? A portfolio diversification perspective. Energy Economics, 104, 105644. [Google Scholar] [CrossRef]
  23. Le, T. L., Abakah, E. J. A., & Tiwari, A. K. (2021). Time and frequency domain connectedness and spill-over among fintech, green bonds and cryptocurrencies in the age of the fourth industrial revolution. Technological Forecasting and Social Change, 162, 120382. [Google Scholar] [CrossRef]
  24. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv, arXiv:1509.02971. [Google Scholar] [CrossRef]
  25. Lin, W., Olmo, J., & Taamouti, A. (2025). Portfolio selection under systemic risk. Journal of Money, Credit and Banking, 57(4), 905–949. [Google Scholar] [CrossRef]
  26. Liu, H., Zhu, G., & Li, Y. (2021). Research on the impact of environmental risk perception and public participation on evaluation of local government environmental regulation implementation behavior. Environmental Challenges, 5, 100213. [Google Scholar] [CrossRef]
  27. Liu, W., Tang, M., & Zhao, P. (2025). The effects of attention to climate change on carbon, fossil energy and clean energy markets: Based on causal network learning algorithms. Energy Strategy Reviews, 59, 101717. [Google Scholar] [CrossRef]
  28. López de Prado, M., Simonian, J., Fabozzi, F. A., & Fabozzi, F. J. (2025). Enhancing Markowitz’s portfolio selection paradigm with machine learning. Annals of Operations Research, 346(1), 319–340. [Google Scholar] [CrossRef]
  29. Ma, Y., Han, R., & Wang, W. (2021). Portfolio optimization with return prediction using deep learning and machine learning. Expert Systems with Applications, 165, 113973. [Google Scholar] [CrossRef]
  30. Markowitz, H. (1952). Modern portfolio theory. Journal of Finance, 7(11), 77–91. [Google Scholar] [CrossRef]
  31. Martínez-Barbero, X., Cervelló-Royo, R., & Ribal, J. (2025). Portfolio optimization with prediction-based return using long short-term memory neural networks: Testing on upward and downward European markets. Computational Economics, 65(3), 1479–1504. [Google Scholar] [CrossRef]
  32. Mensi, W., Naeem, M. A., Vo, X. V., & Kang, S. H. (2022). Dynamic and frequency spillovers between green bonds, oil and G7 stock markets: Implications for risk management. Economic Analysis and Policy, 73, 331–344. [Google Scholar] [CrossRef]
  33. Mezghani, T., Ben Hamadou, F., Boujelbène, M., & Boutouria, S. (2024). Do green bonds affect extreme spillover and hedging effects across stocks and commodities? International Journal of the Economics of Business, 31(2), 103–130. [Google Scholar] [CrossRef]
  34. Mezghani, T., Ben Hamadou, F., & Boujelbène-Abbes, M. (2025). Network connectedness and portfolio hedging of green bonds, stock markets and commodities. International Journal of Emerging Markets, 20(5), 2154–2181. [Google Scholar] [CrossRef]
  35. Miralles-Quirós, J. L., Miralles-Quirós, M. M., & Nogueira, J. M. (2019). Diversification benefits of using exchange—Traded funds in compliance to the sustainable development goals. Business Strategy and the Environment, 28(1), 244–255. [Google Scholar] [CrossRef]
  36. Naqvi, B., Rizvi, S. K. A., Hasnaoui, A., & Shao, X. (2022). Going beyond sustainability: The diversification benefits of green energy financial products. Energy Economics, 111, 106111. [Google Scholar] [CrossRef]
  37. Pham, L. (2021). Frequency connectedness and cross-quantile dependence between green bond and green equity markets. Energy Economics, 98, 105257. [Google Scholar] [CrossRef]
  38. Pigorsch, U., & Schäfer, S. (2022, May 4–5). High-dimensional stock portfolio trading with deep reinforcement learning. 2022 IEEE Symposium on Computational Intelligence for Financial Engineering and Economics (CIFEr) (pp. 1–8), Helsinki, Finland. [Google Scholar] [CrossRef]
  39. Raghunandan, A., & Rajgopal, S. (2022). Do ESG funds make stakeholder-friendly investments? Review of Accounting Studies, 27(3), 822–863. [Google Scholar] [CrossRef]
  40. Reboredo, J. C., & Ugolini, A. (2020). Price connectedness between green bond and financial markets. Economic Modelling, 88, 25–38. [Google Scholar] [CrossRef]
  41. Reboredo, J. C., Ugolini, A., & Aiube, F. A. L. (2020). Network connectedness of green bonds and asset classes. Energy Economics, 86, 104629. [Google Scholar] [CrossRef]
  42. Richardson, B. J. (2009). Keeping ethical investment ethical: Regulatory issues for investing for sustainability. Journal of Business Ethics, 87(4), 555–572. [Google Scholar] [CrossRef]
  43. Saeed, T., Bouri, E., & Tran, D. K. (2020). Hedging strategies of green assets against dirty energy assets. Energies, 13(12), 3141. [Google Scholar] [CrossRef]
  44. Sandberg, J. (2011). Socially responsible investment and fiduciary duty: Putting the freshfields report into perspective. Journal of Business Ethics, 101(1), 143–162. [Google Scholar] [CrossRef]
  45. Sarkar, P., Khanapuri, V. B., & Tiwari, M. K. (2025). Integration of prediction and optimization for smart stock portfolio selection. European Journal of Operational Research, 321(1), 243–256. [Google Scholar] [CrossRef]
  46. Song, R., Liu, L., Wei, N., Li, X., Liu, J., Yuan, J., Yan, S., Sun, X., Mei, L., Liang, Y., Li, Y., Jin, X., Wu, Y., Pan, R., Yi, W., Song, J., He, Y., Tang, C., Liu, X., … Su, H. (2023). Short-term exposure to air pollution is an emerging but neglected risk factor for schizophrenia: A systematic review and meta-analysis. Science of the Total Environment, 854, 158823. [Google Scholar] [CrossRef]
  47. Ta, V. D., Liu, C. M., & Tadesse, D. A. (2020). Portfolio optimization-based stock prediction using long-short term memory network in quantitative trading. Applied Sciences, 10(2), 437. [Google Scholar] [CrossRef]
  48. Ustaoglu, E., Kabadayı, M. E., & Gerrits, P. J. (2021). The estimation of non-irrigated crop area and production using the regression analysis approach: A case study of Bursa Region (Turkey) in the mid-nineteenth century. PLoS ONE, 16(4), e0251091. [Google Scholar] [CrossRef]
  49. Yang, H., Liu, X. Y., Zhong, S., & Walid, A. (2020, October 15–16). Deep reinforcement learning for automated stock trading: An ensemble strategy. The First ACM International Conference on AI in Finance (pp. 1–8), New York, NY, USA. [Google Scholar] [CrossRef]
  50. Yousaf, I., Suleman, T., & Demirer, R. (2022). Green investments: A luxury good or a financial necessity? Energy Economics, 105, 105745. [Google Scholar] [CrossRef]
  51. Zhang, D., Hu, M., & Ji, Q. (2020). Financial markets under the global pandemic of COVID-19. Finance Research Letters, 36, 101528. [Google Scholar] [CrossRef]
  52. Zhang, Y., Wang, M., Xiong, X., & Zou, G. (2021). Volatility spillovers between stock, bond, oil, and gold with portfolio implications: Evidence from China. Finance Research Letters, 40, 101786. [Google Scholar] [CrossRef]
Figure 1. Daily price dynamics over the entire period.
Figure 1. Daily price dynamics over the entire period.
Jrfm 18 00563 g001
Figure 2. Return distribution.
Figure 2. Return distribution.
Jrfm 18 00563 g002
Figure 3. Heat map of pairwise Pearson correlations.
Figure 3. Heat map of pairwise Pearson correlations.
Jrfm 18 00563 g003
Figure 4. Cumulative return performance comparisons of non-sustainable portfolio assets using different portfolio optimization strategies for a risk aversion coefficient of λ = 0.005 and a transaction cost rate of ξ = 0.0005.
Figure 4. Cumulative return performance comparisons of non-sustainable portfolio assets using different portfolio optimization strategies for a risk aversion coefficient of λ = 0.005 and a transaction cost rate of ξ = 0.0005.
Jrfm 18 00563 g004
Figure 5. Cumulative return performance comparisons of mixed portfolio (Sustainable + non_sustainable) assets using different portfolio optimization strategies for a risk aversion coefficient of λ = 0.005 and a transaction cost rate of ξ = 0.0005.
Figure 5. Cumulative return performance comparisons of mixed portfolio (Sustainable + non_sustainable) assets using different portfolio optimization strategies for a risk aversion coefficient of λ = 0.005 and a transaction cost rate of ξ = 0.0005.
Jrfm 18 00563 g005
Figure 6. Cumulative return performance trends comparisons of Non-Sustainable vs. Mixed portfolio using different portfolio optimization strategies for a risk aversion coefficient of λ =0.005 and a transaction cost rate of ξ = 0.0005.
Figure 6. Cumulative return performance trends comparisons of Non-Sustainable vs. Mixed portfolio using different portfolio optimization strategies for a risk aversion coefficient of λ =0.005 and a transaction cost rate of ξ = 0.0005.
Jrfm 18 00563 g006aJrfm 18 00563 g006b
Figure 7. Cumulative return performance trends comparisons of Non-Sustainable vs. Mixed portfolio using RTC-CNN-TD3 under different risk aversion behavior.
Figure 7. Cumulative return performance trends comparisons of Non-Sustainable vs. Mixed portfolio using RTC-CNN-TD3 under different risk aversion behavior.
Jrfm 18 00563 g007aJrfm 18 00563 g007b
Table 1. Descriptive statistics.
Table 1. Descriptive statistics.
NMeanStdSkewnessKurtosisJB ADF
Natural Gas18940.000120.04240.19186.77063606.8314 ***−19.9 ***
Brent Crude Oil1894−7.90 × 10−60.0253−1.391320.409233,298.4466 ***−8.81 ***
S&P 50018940.000330.0123−0.878914.651417,087.4951 ***−12.8 ***
NASDAQ18940.000420.0148−0.66837.13194130.0311 ***−12.1 ***
Treasury Bond ETF1894−6.00 × 10−50.010.09025.00621967.4897 ***−14.6 ***
Wind18940.000140.0122−0.665211.49710,509.9962 ***−9.50 ***
Water18940.00020.0106−0.661513.815915,115.2803 ***−12.5 ***
Solar18940.000320.03020.3284.85861884.5616 ***−29.9 ***
First trust Nasdaq clean energy18940.000140.0231−0.17863.1717798.1544 ***−10.3 ***
S&P Green Bond Index1894−1.00 × 10−50.004−0.00994.53131609.5886 ***−36.8 ***
S&P GSCI Precious Metals18940.000430.0098−0.35634.12381372.8933 ***−19.3 ***
Note: The table extracts descriptive statistics for daily returns. JB is the value of the Jarque–Bera statistic, which tests normality. *** Indicates significance at the 1% level. ADF unit root tests stationarity.
Table 2. Portfolio performance measures under different optimization methods when λ =0.005 and ξ = 0.0005.
Table 2. Portfolio performance measures under different optimization methods when λ =0.005 and ξ = 0.0005.
MethodRTC-LSTM-TD3RTC-LSTM-DDPGRTC-CNN-TD3RTC-CNN-DDPGMV
Panel A: Non-sustainable portfolio assets
Annual Return (%)8.7280225.8562766.7960255.2804921.369172
Cum. Return (%)12.9978318.66610510.0777377.804042.005711
Annual Volatility (%)13.03695214.50637114.19720415.3476927.254997
Sharpe Ratio0.6694833270.4037037240.4786875640.3440577250.188721236
Max Drawdown (%)−7.12162−7.915487−7.611277−8.777178−7.491004
Calmar Ratio1.2255670.73985030.89288890.6016160.182775
Panel B: Mixed portfolio assets (Sustainable + non_ Sustainable)
Annual Return (%)24.17958916.18924218.23335112.7798733.372918
Cum. Return (%)37.31449724.57231227.79449319.256484.977358
Annual Volatility (%)8.2976038.8901038.35607213.94822613.944672
Sharpe Ratio2.9140450561.8210409942.1820480960.9162364450.241878619
Max Drawdown (%)−4.245709−4.819083−4.9145−6.327542−8.25027
Calmar Ratio5.6950653.3594033.7101132.019720.221171
Table 3. Portfolio performance measures of non-sustainable and mixed portfolios using the RTC-LSTM-TD3 under different risk aversion measures.
Table 3. Portfolio performance measures of non-sustainable and mixed portfolios using the RTC-LSTM-TD3 under different risk aversion measures.
λ = 0.005λ = 0.015λ = 0.03λ = 0.05
Panel A: non-sustainable portfolio assets
Annual Return (%)8.7280227.6830425.9140414.039317
Cum. Return (%)12.99783111.4481478.7775095.969782
Annual Volatility (%)13.03695212.40991610.8074377.204958
Sharpe Ratio0.66948330.61910500.5472195670.560630194
Max Drawdown (%)−7.12162−6.533904−5.04398−4.643523
Calmar Ratio1.22556691.17587311.17249490.8698819
Panel B: Mixed portfolio assets (Sustainable+non sustaianable)
Annual Return (%)24.17958918.80022913.8530379.073077
Cum. Return (%)37.31449728.69268720.92180513.561021
Annual Volatility (%)8.2976037.4860825.6145623.743041
Sharpe Ratio2.9140450562.511357612.467340642.423985471
Max Drawdown (%)−4.245709−4.210735−3.168674−2.119528
Calmar Ratio5.6950654.4648334.3718724.280705
Table 4. Portfolio performance measures of non-sustainable and mixed portfolios using the RTC-CNN-TD3 under different risk aversion measures.
Table 4. Portfolio performance measures of non-sustainable and mixed portfolios using the RTC-CNN-TD3 under different risk aversion measures.
λ = 0.005λ = 0.015λ = 0.03λ = 0.05
Panel A: non-sustainable portfolio assets
Annual Return (%)6.7960255.6264784.2958312.886775
Cum. Return (%)10.0777378.3453235.3525814.255247
Annual Volatility (%)14.19720413.9603311.7217238.167189
Sharpe Ratio0.4786875640.4030333090.3664846030.353460046
Max Drawdown (%)−7.611277−7.252643−6.247336−5.178468
Calmar Ratio0.8928889330.7757831180.6876260540.557457341
Panel B: Mixed portfolio assets (Sustainable + non sustaianable)
Annual Return (%)18.23335113.57716810.2279556.777321
Cum. Return (%)27.79449320.49301615.32599410.078226
Annual Volatility (%)8.3560727.729296.7851754.20975
Sharpe Ratio2.1820481.7565871.5073971.609911
Max Drawdown (%)−4.9145−4.727053−4.484241−2.91062
Calmar Ratio3.7101132.8722272.2808662.32848
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ben Hamadou, F.; Boujelbène Abbes, M. Sustainable vs. Non-Sustainable Assets: A Deep Learning-Based Dynamic Portfolio Allocation Strategy. J. Risk Financial Manag. 2025, 18, 563. https://doi.org/10.3390/jrfm18100563

AMA Style

Ben Hamadou F, Boujelbène Abbes M. Sustainable vs. Non-Sustainable Assets: A Deep Learning-Based Dynamic Portfolio Allocation Strategy. Journal of Risk and Financial Management. 2025; 18(10):563. https://doi.org/10.3390/jrfm18100563

Chicago/Turabian Style

Ben Hamadou, Fatma, and Mouna Boujelbène Abbes. 2025. "Sustainable vs. Non-Sustainable Assets: A Deep Learning-Based Dynamic Portfolio Allocation Strategy" Journal of Risk and Financial Management 18, no. 10: 563. https://doi.org/10.3390/jrfm18100563

APA Style

Ben Hamadou, F., & Boujelbène Abbes, M. (2025). Sustainable vs. Non-Sustainable Assets: A Deep Learning-Based Dynamic Portfolio Allocation Strategy. Journal of Risk and Financial Management, 18(10), 563. https://doi.org/10.3390/jrfm18100563

Article Metrics

Back to TopTop