Two-Stage Distributionally Robust Optimization for an Asymmetric Loss-Aversion Portfolio via Deep Learning

Zhang, Xin; Liu, Shancun; Pan, Jingrui

doi:10.3390/sym17081236

Open AccessArticle

Two-Stage Distributionally Robust Optimization for an Asymmetric Loss-Aversion Portfolio via Deep Learning

by

Xin Zhang

,

Shancun Liu

^* and

Jingrui Pan

School of Economics and Management, Beihang University, No. 37, Xueyuan Road, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(8), 1236; https://doi.org/10.3390/sym17081236

Submission received: 25 June 2025 / Revised: 26 July 2025 / Accepted: 30 July 2025 / Published: 4 August 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

In portfolio optimization, investors often overlook asymmetric preferences for gains and losses. We propose a distributionally robust two-stage portfolio optimization (DR-TSPO) model, which is suitable for scenarios where the loss reference point is adaptively updated based on prior decisions. For analytical convenience, we further reformulate the DR-TSPO model as an equivalent second-order cone programming counterpart. Additionally, we develop a deep learning-based constraint correction algorithm (DL-CCA) trained directly on problem descriptions, which enhances computational efficiency for large-scale non-convex distributionally robust portfolio optimization. Our empirical results obtained using global market data demonstrate that during COVID-19, the DR-TSPO model outperformed traditional two-stage optimization in reducing conservatism and avoiding extreme losses.

Keywords:

loss aversion; portfolio; distributionally robust two-stage optimization; deep learning algorithm; decision-dependent loss reference

1. Introduction

In recent years, the increasing frequency of geopolitical conflicts, global supply chain disruptions, and macroeconomic fluctuations have significantly heightened the uncertainty and complexity of financial markets. On the one hand, the complexity of investor behavior complicates the construction of quantitative portfolios. Moreover, market uncertainty severely impacts asset price stability and amplifies investors’ sensitivity to risk. Researchers have observed that during periods of intense market volatility, traditional portfolio theories often overlook the asymmetric sensitivity of investors to gains and losses. Kahneman and Tversky [1] provided an alternative theoretical framework for decision making under such uncertainty. The decision model based on prospect theory incorporates new features, such as (i) reference dependence, where decision-makers evaluate outcomes as gains or losses relative to a reference point that fluctuates with market conditions and wealth [2], (ii) asymmetric utility, as Zhang and Semmler [3] demonstrated that previous gains and losses in the stock market have an asymmetric impact on investment behavior, and (iii) different probabilities for evaluating gains and losses. These features lead to highly nonlinear investor utility, posing significant challenges in modeling and solving portfolio optimization problems from this perspective.

In modeling loss aversion for portfolio optimization, the focus is on characterizing the updating mechanism of the loss reference point. The application of a static reference point, set subjectively by the decision-maker, is quite limited [4,5] and struggles to adapt to dynamic market conditions. A widely accepted view is that the loss reference point updates in an adaptive manner [6,7]. Our work builds on the literature of behavioral portfolio selection based on prospect theory, which includes notable studies such as [4,8,9,10,11,12,13,14], among others. However, we further extend this framework by incorporating a dynamic, decision-dependent reference point, and we model the uncertainty of the loss reference point through ambiguity sets in the distributionally robust optimization (DRO) framework. In recent years, DRO has been widely applied in uncertainty modeling, as it avoids unreasonable assumptions about the distributional form of random variables and provides robust solutions through worst-case analysis when precise distribution information is unavailable. Within the DRO framework, we propose an update mechanism for the decision-dependent loss aversion reference point, allowing the reference point to adapt dynamically to market performance and investor behavior. We assume that the distribution of the random loss reference point depends solely on the investor’s prior decisions and is independent of asset returns, without requiring additional commitments to the specific distribution form. Specifically, the difference between prior decision returns and market returns influences the expected value of the second-stage random loss reference point, while the difference in market weights and prior decisions affects the variance of the loss reference point, as illustrated in Figure 1. When a loss occurs, investors compare their performance to that of the market or other investors, which in turn affects their risk preferences, decision making, and expectations for future returns. This “social comparison effect” stems from a common psychological mechanism in human social behavior, where individuals tend to assess their own performance by comparing it to others, something that particularly evident in financial decision making [15].

We present the following three main contributions to the distributionally robust two-stage optimization portfolio (DR-TSPO) problem under loss aversion:

We propose an updated mechanism for the loss reference point based on prior decisions, which adapts to market fluctuations and investor behavior. This mechanism captures how investors dynamically respond to market changes. We also derive the equivalent dual of the DR-TSPO problem, transforming the original problem into a second-order cone programming problem that is easier to implement, providing a solid theoretical foundation for algorithm design and practical applications.
We develop a deep learning-based constraint correction algorithm (DL-CCA) to solve complex optimization problems with nonlinear and non-convex constraints. Specifically, the innovation of this method lies in training the neural network directly from the problem’s specifications, rather than from an existing supervised dataset, effectively handling complex non-convex constraints. Experimental results show that the DL-CCA algorithm, leveraging fully connected neural networks, outperforms Trust-Constr, HO, and LSTM/CNN-based variants in solving large-scale constrained non-convex problems, achieving superior average optimal objective values (0.0029) and faster solution times (76.78 s).
We validate the advantages of the loss aversion Distributionally Robust Two-Stage Portfolio Optimization (DR-TSPO) model in dealing with loss and uncertainty using global key stock index component data. The experimental results show that the DR-TSPO model exhibits strong robustness and lower drawdown under extreme market conditions (such as the 2020 COVID-19 pandemic). For instance, in the Chinese market, the DR-TSPO’s annual return is 0.4635, significantly higher than the TSPO’s 0.3020, with lower volatility (DR-TSPO: 0.4430 vs. TSPO: 0.5846), demonstrating stronger capital protection ability.

This study deeply combines behavioral finance theory with modern optimization methods. On the one hand, the decision-dependent loss reference point mechanism enriches the application scenarios of prospect theory in asset allocation. At the same time, the DL-CCA algorithm provides a universal solution framework for complex optimization problems in financial engineering, which has enlightening significance for technological upgrading in the fields of robo-advisory and risk management. In the future, it can be extended to more complex scenarios such as multi-stage investment decisions and cross-border asset allocation.

The remainder of the paper is organized as follows. Section 2 reviews related work on loss aversion and distributionally robust optimization. Section 3 introduces the two-stage portfolio optimization (TSPO) model with stochastic loss reference points. Section 4 develops the loss aversion-based DR-TSPO model and derives its tractable reformulation. In Section 5, we propose a DL-CCA algorithm for solving large-scale DR-TSPO problems. Section 6 is the algorithm comparison experiment, including efficiency comparison and ablation study. Section 7 designs empirical experiments using real data from global key index constituents and analyzes the results. The experimental findings demonstrate that, compared to conventional two-stage optimization models, the loss aversion-based DR-TSPO exhibits higher robustness and adaptability. Finally, Section 8 concludes the paper and discusses future research directions.

2. Related Work

Since the construction of the Markowitz classical portfolio theory [16], portfolio construction methods based on mathematical statistics have developed rapidly. However, these models rely on the assumption that all investors are well informed and rational with no bias sentiment. In the 1980s, scholars began to focus on the impact of loss aversion on portfolios. This section provides a comprehensive review of loss aversion in financial investments and the DRO method.

2.1. Loss Aversion in Financial Investments

Loss aversion, as a crucial component of prospect theory, provides a novel perspective for investigating investor preferences and portfolio construction [2]. By incorporating loss aversion into portfolio construction, we can account for investors’ loss preferences, explain the asymmetric impact of prior gains and losses in the stock market on investment behavior, and thereby mitigate decision-making errors caused by cognitive biases [3]. Regarding investment strategies for loss-averse investors, Berkelaar et al. [8] demonstrated that over shorter investment horizons (e.g., less than five years), loss-averse investors significantly reduce the initial portfolio weight allocated to stocks compared to investors with smooth power utility. However, loss aversion and risk aversion are empirically difficult to distinguish clearly, necessitating the examination of individual investor trading behavior through the lens of utility derived from realized gains and losses [4]. Based on prospect theory, Jin and Yu Zhou [9] developed and analyzed a portfolio selection model featuring an S-shaped utility (value) function and probability weighting. The reference point for losses serves as a key element in this model for measuring loss utility, functioning as a standard or benchmark to delineate gains from losses. However, the specific mechanisms by which decision-makers form and update reference points remain insufficiently understood. Baucells et al. [17] demonstrated through experiments that this process is not a simple recursive procedure. Shi et al. [13] integrated the adaptive process of reference points with investors’ perception of past gains and losses, constructing a dynamic trading model featuring reference point adaptation and loss aversion, and derived its semi-analytical solution.

Notably, existing research on portfolio optimization with reference point updating typically assumes that decision-makers can anticipate the evolution of reference points and thereby resolve the time inconsistency issue in dynamic optimization. Strub and Li [18] compared optimal investment strategies under different reference point updating rules within time-consistent and time-inconsistent frameworks, providing empirical evidence that decision-makers often struggle to foresee the updating process of reference points. van Bilsen and Laeven [19] further highlighted that loss-averse individuals endogenously update their reference levels over time and distort probabilities, with experimental findings showing that investors with prospect-theoretic preferences tend to adopt more conservative portfolio strategies and exhibit lower sensitivity of optimal consumption strategies to economic shocks. He and Strub [20] examined the impact of different partially endogenous reference point generation models on optimal portfolio decisions under loss aversion. Gao et al. [7] explored the behavioral characteristics of loss-averse investors with dynamically adjusted reference points in market environments with serially correlated returns, offering new insights into investor decision making in complex market settings.

Nevertheless, current research on the application of loss aversion in portfolio construction still exhibits notable limitations. For example, scholars frequently adopt static loss aversion parameters in their models, failing to dynamically capture the inherent ambiguity in loss reference point distributions. Additionally, the fixed structure of ambiguity sets cannot adequately reflect investors’ differential dependence on realized returns versus target returns. These methodological constraints motivate us to refine the quantitative modeling framework for loss reference points while simultaneously developing more robust portfolio optimization approaches.

2.2. Distributionally Robust Portfolio Optimization

Distributionally robust optimization (DRO) methods have been widely applied to uncertainty analysis in portfolio selection, focusing on the uncertainty of random variables rather than assuming known variables or their distributions [21,22,23]. Furthermore, DRO can effectively mitigate the impact of outliers and reduce the interference of ambiguity in portfolio construction, thereby enhancing robustness and flexibility. Building upon this optimization framework, we aim to relax the stringent requirements regarding loss aversion reference points in portfolio modeling, such as regulator-prescribed reference points or uncertainty considerations under known distributions. Generally, distributionally robust optimization assumes that the distribution of random variables belongs to a well-defined ambiguity set. Garlappi et al. [24] posited that investors’ ambiguity aversion manifests as mean returns belonging to an ellipsoidal uncertainty set, studying robust mean-variance optimization under parameter and model uncertainty. Under ambiguity sets such as mixture distributions, box-type, and ellipsoidal sets, researchers have proposed robust portfolios based on minimizing worst-case conditional value-at-risk (CVaR) [25,26,27]. Empirical results demonstrate that portfolios constructed via this method exhibit superior diversification, stability, expected returns, and turnover compared to non-robust approaches.

However, traditional ambiguity sets often include excessive inappropriate discrete distributions. To address this, we consider data-driven ambiguity sets (e.g., Wasserstein,

φ

-divergence), which leverage data adaptiveness, statistical rigor, and tail risk control to overcome the over-conservatism of conventional ambiguity sets. In this regard, Pflug and Wozabal [28] introduced the Wasserstein distance to describe portfolio ambiguity sets, which was later extended by Wozabal [29] to more general cases. Gao et al. [30] derived a worst-case expectation representation for Wasserstein-based ambiguity sets, applied to portfolio construction centered on empirical measures with different risk metrics. Blanchet et al. [31] proposed a DRO model incorporating return and variance uncertainty distributions using the Wasserstein metric. Existing studies primarily focus on single-stage distributionally robust portfolio construction, motivating us to extend this research trajectory—particularly by incorporating the decision-dependent nature of loss aversion introduced in this study. The first-stage decision must be made before observing the realization of random parameters, while the second-stage decision involves adjustments or compensations based on the first-stage outcome after observing the actual realizations. Our proposed two-stage distributionally robust portfolio optimization framework with loss aversion not only integrates behavioral finance into quantitative portfolio research but also relaxes the constraint of regulator-imposed reference points by leveraging DRO properties.

Additionally, solving DRO problems requires mapping the primal problem to the dual space via duality theory. Since uncertainty sets (e.g., moment-based, Wasserstein distance, or

ϕ

-divergence) may introduce non-convex or even non-smooth constraints, large-scale DRO problems often face severe computational challenges. Even with convex relaxation or stochastic optimization techniques, existing methods struggle to obtain high-quality feasible solutions within reasonable timeframes, especially in high-dimensional or dynamic settings where computational complexity grows exponentially, severely limiting DRO’s practical deployment. To address this, we incorporate deep learning to design heuristic algorithms that enhance the solution efficiency of non-convex robust models in behavioral portfolio optimization.

3. Basic Two-Stage Portfolio Optimization Model with Decision-Dependent Loss Aversion

In the investment process, assume that the investor allocates all assets to the stock market, and the current portfolio weight vector is

x = (x_{1}, x_{2}, \dots, x_{n})

, where

x \in X

is a feasible portfolio weight vector. The stock return vector

r

follows a normal distribution

r \sim N (μ, Σ)

, so the portfolio return

R (x)

can be expressed as:

R (x) = r^{⊤} x .

(1)

The expected return and risk of the portfolio are, respectively, given by

E [R (x)] = μ^{⊤} x; C o v [R (x)] = x^{⊤} Σ x .

(2)

Let

μ

denote the vector of expected returns of the stocks, and

Σ

represent the covariance matrix of the returns.

For an investor with loss aversion preferences, assume that the loss aversion coefficient is

φ

and there is a psychological reference point ℓ (the loss reference point) to evaluate the portfolio’s gains and losses. When the portfolio return falls below the reference point ℓ, the investor’s utility decreases. Specifically, the

τ

-order loss aversion utility is defined as:

φ E_{l \sim P} {({[l - μ^{⊤} x, 0]}_{+})}^{τ} .

(3)

where

{[\cdot, 0]}_{+}

denotes the non-negative part (i.e., when a loss occurs, the investor experiences loss aversion; when the portfolio return exceeds the reference point, the loss aversion effect is zero). In practice, the assessment of losses is closely related to the prior investment decision

y \in X

. Therefore, consider the loss reference point ℓ as a random variable, where

l \sim P_{(y)}

.

Two-stage portfolio optimization is used in various financial and investment scenarios, aiming to enhance portfolio performance, reduce risk, and adapt to changing market conditions. It is applicable to optimization scenarios where decision making is dependent on random variables. Assume the decision set

X

is related to market random events, and the investor’s prior decisions reflect potential losses the investor may face under specific circumstances. In this framework, based on different random events k, the decision-maker has a determined investment decision

y \in X_{(k)}, y = (y_{1}, y_{2}, \dots, y_{n})

and a loss reference point ℓ. Consider the two-stage portfolio optimization problem (TSPO) under a quadratic loss utility function. Let the investor’s utility function be defined as

U (x, l) = μ^{⊤} x - c \sum_{i \in [N]} | x_{i} - y_{i} | - φ \cdot {[l - μ^{⊤} x, 0]}_{+}^{2},

where c is the transaction cost coefficient. Under the constraint of no short-selling, the two-stage portfolio optimization problem for a loss-averse investor is:

\begin{matrix} (T S P O) : max_{y \in X} μ^{⊤} y + E_{l \sim P_{(y)}} [max_{x \in X} U (x, l)], \end{matrix}

(4)

\begin{matrix} s . t . \sum_{i \in [N]} x_{i} = 1, \sum_{i \in [N]} y_{i} = 1, \end{matrix}

(5)

\begin{matrix} l \in L (y), x_{i}, y_{i} \in [0, 1], \forall i \in [N] . \end{matrix}

(6)

After the decision variable

y

is made in the first stage, the investor determines ℓ based on the realized gains and losses. The variable

x

represents the decision variable in the second stage, which is made after the random loss reference point ℓ is determined. Under the decision

y

, the realization of ℓ can be determined by accessing a discrete probability distribution

{\hat{P}}_{(y)}

, which contains

[S]

samples of loss reference points. This is defined as:

{\hat{P}}_{(y)} (l) = \sum_{s = 1}^{S} p_{s} (y) \cdot δ (l - {\hat{l}}_{s} (y)) .

(7)

where:

-: ${\hat{p}}_{s} (y)$ is the probability of the s-th sample, satisfying ${\hat{p}}_{s} (y) > 0$ and $\sum_{s \in [S]} {\hat{p}}_{s} (y) = 1$ , with the general assumption of equal probability.
-: ${\hat{l}}_{s} (y)$ is the s-th loss reference point sample in the set determined by decision $y$ .
-: $δ (l - {\hat{l}}_{s} (y))$ is the Dirac delta function, which represents a probability mass at the point of the discrete distribution.

The discrete loss reference point samples

{\hat{l}}_{s} (y)

can be obtained from historical data, expert knowledge, or by extracting reference distribution samples based on prior decision characteristics. This study primarily focuses on adaptive optimization methods, so the samples will be directly extracted based on the characteristics of the decision

y

.

We assume that for a finite set of events k, there is a unique decision

y

and a corresponding set of random loss reference point samples

\hat{l} \in {\hat{L}}_{k}

. The expected loss values of these distributions should be adjusted based on the nature of the events and the psychological expectations of the investors, ensuring that the expected values are ordered from largest to smallest, i.e.,

E [{\tilde{l}}_{1}] \leq E [{\tilde{l}}_{2}] \leq \dots \leq E [{\tilde{l}}_{K}] .

This structure ensures that the event partitions not only reflect the loss aversion sentiment of market participants but also provide a clear, stepwise basis for the expected loss of the reference points. For example, according to expert opinions and historical statistics, we classify random market states into five categories of events according to the varying levels of loss aversion among market participants. Four thresholds

T_{1} < T_{2} < T_{3} < T_{4}

are set, which are used to define the event classification criteria. Specifically, we define the event set

X_{k} = {y_{1}, y_{2}, y_{3}, y_{4}, y_{5}}

, where each event

y_{i}

(

i \in {1, 2, 3, 4, 5}

) corresponds to the following:

Event $y \in X_{1}$ : High loss aversion, satisfying $E [\hat{l}] \leq T_{1}$ . Investors have suffered significant losses in the past or experienced a large gap in returns compared to the benchmark portfolio. This leads to high loss aversion, causing investors to set a lower reference loss $\hat{l}$ for the new decision round. Such events are often accompanied by sharp market declines, where some investors may underestimate the market’s recovery potential, resulting in overly pessimistic expectations. Extreme outliers cause higher sample variance.
Event $y \in X_{2}$ : Moderate-high loss aversion, satisfying $T_{1} < E [\hat{l}] \leq T_{2}$ . Investors may have experienced some losses, but the overall loss is smaller or the return difference with the benchmark portfolio is less significant, resulting in lower loss aversion. The sample variance is smaller.
Event $y \in X_{3}$ : Moderate loss aversion, satisfying $T_{2} < E [\hat{l}] \leq T_{3}$ . Investors may have followed a benchmark-tracking strategy, with returns similar to or nearly identical to the benchmark, resulting in little additional loss.
Event $y \in X_{4}$ : Moderate-low loss aversion, satisfying $T_{3} < E [\hat{l}] \leq T_{4}$ . Investors have achieved some excess returns compared to the benchmark portfolio and, due to a small deviation from market strategies, exhibit some degree of risk aversion. The new reference loss $\hat{l}$ is positive but relatively low. The sample variance for the current market state is also low.
Event $y \in X_{5}$ : Low loss aversion, satisfying $E [\hat{l}] > T_{4}$ . Investors have made significant portfolio adjustments or earned returns higher than the market. The new reference loss $\hat{l}$ is high. Some investors may exhibit overconfidence, where overestimating their own abilities influences their decisions and expectations, causing outlier sample variance among those pursuing higher returns.

However, a finite set of random events is insufficient to explain the complex market states. When the threshold T is infinitely subdivided (or the time intervals

(T_{k - 1}, T_{k}]

are sufficiently small), market uncertainty is modeled as an infinite number of events (

k \to \infty

) and an infinite number of feasible decisions

y

. By representing the sample set

\hat{L}

as a continuous functional relationship of random events

{\hat{L}}_{k} = f (y)

, we introduce a mapping function

f : R^{n} \to M_{2} \subseteq R^{2}

. This maps the decision of random events to the reference distribution information space of loss reference points

M_{2} = \{\hat{P} ∣ E [\hat{l}] = μ_{\hat{l}}, Var [\hat{l}] = σ_{\hat{l}}^{2}\}

, and then generates discrete samples

{\hat{l}}_{s}

, for

s = 1, \dots, S

, according to the specific distribution moments. For each random event, the following relationship is set based on the reference distribution mean

E [\hat{l}]

and variance

Var [\hat{l}]

:

\begin{matrix} E [\hat{l}] = β \cdot (μ^{⊤} y - μ^{⊤} y_{m k t}), \end{matrix}

(8)

\begin{matrix} Var [\hat{l}] = \sqrt{γ} \cdot {∥ y - y_{m k t} ∥}_{2}^{2} . \end{matrix}

(9)

where

β

and

γ

serve as adjustment coefficients, while

y_{m k t}

represents the market portfolio weights. The term

(μ^{⊤} y - μ^{⊤} y_{m k t})

reflects the investor’s prior gains or losses. When a loss occurs,

E [\hat{l}] < 0

; otherwise, if there is no loss,

E [\hat{l}] \geq 0

. The norm distance

∥ y - y_{m k t} ∥_{2}

measures the degree of deviation between the investor’s portfolio and the market portfolio, indicating the level of active management. In an efficient market, where information disseminates rapidly and prices adjust swiftly to all available information, investors tend to adopt passive investment strategies to track market indices. In this environment, active management struggles to generate consistent excess returns. Additionally, in low-volatility markets, investors are more inclined toward passive management

∥ y - y_{m k t} ∥_{2} \to 0

, aiming for relatively stable returns. Conversely, in an inefficient market, information asymmetry and the market’s failure to fully reflect fundamentals make active management strategies more attractive, as investors can exploit these inefficiencies to achieve excess returns. In high-volatility markets, where uncertainty and price fluctuations are significant, investors also prefer active management

∥ y - y_{m k t} ∥_{2} ↛ 0

to seize short-term investment opportunities.

Specifically, let the sample vector

\hat{l}

be:

\begin{matrix} \hat{l} = μ^{⊤} [β \cdot (y - y_{m k t}) \otimes 1_{S}^{⊤} + 1_{N} \otimes ε^{⊤}], \end{matrix}

(10)

\begin{matrix} ε \overset{i . i . d .}{\sim} F \{ε : E [ε] = 0, C o v [ε] = γ \cdot ∥ y - y_{m k t} ∥_{2}^{2} \cdot I_{S}\} . \end{matrix}

(11)

Here, “⊗” represents the tensor product of vectors, and

1_{N}

is a vector of size N with all elements equal to 1. The noise vector

ε

of dimension S consists of independent and identically distributed (i.i.d.) elements that follow a multivariate distribution given by

\{ε : E [ε] = 0, Cov [ε] = γ \cdot ∥ y - y_{m k t} ∥_{2}^{2} \cdot I_{S}\}

, where

I_{S}

is the S-dimensional identity matrix. The larger the prior loss, the higher the overall expected value of the sample. The greater the deviation of the decision from the market, i.e., the larger

∥ y - y_{m k t} ∥_{2} ↛ 0

, the higher the uncertainty regarding the loss reference, resulting in a larger overall variance. Define the sample set as:

\begin{matrix} \hat{L} (y) = \{{\hat{l}}_{s} : {\hat{l}}_{s} = β \cdot μ^{⊤} (y - y_{m k t}) + ε_{s} \cdot 1_{S}^{⊤} μ, \forall s \in [S]\} . \end{matrix}

(12)

According to the sample estimate, the TSPO can be approximated as a single-stage optimization problem.

\begin{matrix} (TSPO) : \\ max_{x, y, \in X, ε} & μ^{⊤} (y + x) - c \sum_{i \in [N]} |x_{i} - y_{i}| - φ \cdot \sum_{s \in [S]} {[β \cdot μ^{⊤} (y - y_{m k t}) + ε_{s} \cdot 1_{S}^{⊤} μ - μ^{⊤} x, 0]}_{+}^{2}, \end{matrix}

(13)

\begin{matrix} s . t . & \frac{1}{S - 1} \sum_{s \in [S]} ε_{s}^{2} = γ \cdot {∥ y - y_{m k t} ∥}_{2}^{2}, \end{matrix}

(14)

\begin{matrix} \sum_{i \in [N]} x_{i} = 1, \sum_{i \in [N]} y_{i} = 1, \sum_{s \in [S]} ε_{s} = 0, \end{matrix}

(15)

\begin{matrix} x_{i}, y_{i} \in [0, 1], \forall i \in [N] . \end{matrix}

(16)

By introducing auxiliary variables

ζ_{s}, \forall s \in [S]

, the problem becomes equivalent to a second-order cone programming (SOCP) formulation.

\begin{matrix} (TSPO) : max_{x, y, \in X, ε, ζ} & μ^{⊤} (y + x) - c \sum_{i \in [N]} |x_{i} - y_{i}| - φ \cdot \sum_{s \in [S]} ζ_{s}^{2}, \end{matrix}

(17)

\begin{matrix} s . t . & \frac{1}{S - 1} \sum_{s \in [S]} ε_{s}^{2} = γ \cdot {∥ y - y_{m k t} ∥}_{2}^{2}, \end{matrix}

(18)

\begin{matrix} \sum_{i \in [N]} x_{i} = 1, \sum_{i \in [N]} y_{i} = 1, \sum_{s \in [S]} ε_{s} = 0, \end{matrix}

(19)

\begin{matrix} ζ_{s} \geq β \cdot μ^{⊤} (y - y_{m k t}) + ε_{s} \cdot 1_{S}^{⊤} μ - μ^{⊤} x, \forall s \in [S], \end{matrix}

(20)

\begin{matrix} x_{i}, y_{i} \in [0, 1], \forall i \in [N], ζ_{s} \geq 0, \forall s \in [S] . \end{matrix}

(21)

The definitions of the notations are presented in the Appendix A.

4. DR-TSPO Model

In fact, the reference distribution

\hat{P}

may not accurately reflect the real situation, especially in scenarios with sparse data or noise. This bias can lead to suboptimal decisions in practical applications, increasing potential risks. To address the uncertainty in real-world distributions, two-stage distributionally robust optimization (DRO) adopts a more conservative approach. By constructing a distributional ambiguity set that encompasses all possible true distributions, the DRO optimization scheme targets the worst-case distribution for optimization. This “worst-case” approach effectively mitigates the impact of the reference distribution deviating from the true distribution, providing more robust decisions. Additionally, in high-uncertainty situations, it better balances risk and return, offering more reliable support for real-world decision making.

To account for the uncertainty of the loss reference point ℓ, we introduce an ambiguity set based on the Wasserstein distance to characterize the distribution of the loss reference point. Specifically, the Wasserstein distance is used to measure the gap between the true distribution and the reference distribution. The uncertainty set

B ({\hat{P}}_{(y)})

for the true distribution is defined as:

\begin{matrix} B ({\hat{P}}_{(y)}) = \{P \in P_{0} (L) | \begin{matrix} l \sim P, \hat{l} \sim {\hat{P}}_{(y)}, \\ d_{W} (P, {\hat{P}}_{(y)}) \leq ϵ, ϵ \in R^{+} \end{matrix}\} . \end{matrix}

(22)

where the set

P_{0} (L)

represents the collection of all Borel probability distributions on

L \subseteq R^{P}

, where

L

is a prescribed cone-shaped representable set.

{\hat{P}}_{(y)}

is the reference distribution of the loss reference point determined by the prior decision

y

. Although this reference distribution is discrete, the ambiguity set can encompass both discrete and continuous distributions.

d_{W} (\cdot, \cdot)

is the Wasserstein distance metric between two distributions, which measures the minimum cost required to transform one distribution into another, typically understood as the “transportation cost” in geographical space. The Type-2 Wasserstein distance, through its quadratic penalty mechanism, enables more refined control over higher-order distributional characteristics. This proves particularly advantageous when balancing mean–variance trade-offs, handling extreme events, or addressing high-dimensional correlations. However, its computational cost may be significantly higher, necessitating careful consideration of the optimal order selection based on specific problem requirements.

To construct a concrete optimization model and link the ambiguity set to the prior decision

y

, we use the ambiguity set based on the Wasserstein distance to capture the distributional uncertainty of ℓ. The size of this ambiguity set can be adjusted by the parameter

ϵ

, which represents the radius of the ambiguity sphere and controls the degree of uncertainty. The specific definition of the Wasserstein distance is:

\begin{matrix} d_{W} (P, {\hat{P}}_{(y)}) = inf_{Π \in Γ (P, {\hat{P}}_{(y)})} \int_{L \times L} {∥ l - \hat{l} ∥}_{2}^{2} d Π (l, \hat{l}), \end{matrix}

(23)

where

Γ (P, {\hat{P}}_{(y)})

denotes the set of all possible joint distributions, with marginal distributions

P

and

{\hat{P}}_{(y)}

, respectively. ℓ and

\hat{l}

are samples drawn from these distributions. This metric ensures that the Wasserstein distance between any possible distribution

P

and the reference distribution

{\hat{P}}_{(y)}

does not exceed

ϵ

, thereby introducing uncertainty management.

When considering transaction costs, the investor’s objective is to minimize both the loss and some necessary costs. Specifically, the investor needs to consider not only the expected return

μ^{⊤} x

, but also loss aversion, ambiguity losses, and upper bounds on transaction costs. In this case, the investor’s objective function can be expressed as:

\begin{matrix} max_{\binom{y \in X,}{\hat{l} \in \hat{L} (y), ε \in R^{S}}} μ^{⊤} y - min_{x \in X} \{c \sum_{i \in [N]} |x_{i} - y_{i}| - μ^{⊤} x + sup_{\binom{l \sim P,}{P \in B ({\hat{P}}_{(y)})}} φ \cdot E_{P} {[l - μ^{⊤} x, 0]}_{+}^{2}\} . \end{matrix}

(24)

This objective function combines the prior decision, loss aversion preferences, and market transaction costs, aiming for effective asset allocation by maximizing the net returns of the two-stage portfolio under the worst-case scenario. Considering the two-stage distributionally robust portfolio optimization problem (DR-TSPO) under a quadratic loss utility function, the optimization problem can be expressed as:

\begin{matrix} (DR - TSPO) : \\ max_{\binom{y \in X,}{\hat{l} \in \hat{L} (y), ε \in R^{S}}} & μ^{⊤} y - min_{x \in X} \{c \sum_{i \in [N]} |x_{i} - y_{i}| - μ^{⊤} x + sup_{\binom{l \sim P,}{P \in B ({\hat{P}}_{(y)})}} φ \cdot E_{P} {[l - μ^{⊤} x, 0]}_{+}^{2}\}, \end{matrix}

(25)

\begin{matrix} s . t . & \hat{l} = β \cdot μ^{⊤} [(y - y_{m k t}) \otimes 1_{S}^{⊤} + 1_{N} \otimes ε^{⊤}], \end{matrix}

(26)

\begin{matrix} \frac{1}{S - 1} \sum_{s \in [S]} ε_{s}^{2} = γ \cdot {∥ y - y_{m k t} ∥}_{2}^{2}, \end{matrix}

(27)

\begin{matrix} \sum_{i \in [N]} x_{i} = 1, \sum_{i \in [N]} y_{i} = 1, \sum_{s \in [S]} ε_{s} = 0, \end{matrix}

(28)

\begin{matrix} x_{i}, y_{i} \in [0, 1], \forall i \in [N] . \end{matrix}

(29)

Here, c is the cost coefficient for buying and selling stocks. The utility model objective in Equation (25) considers the possible true distribution of the loss reference point ℓ based on the prior decision and aims to maximize the net returns of the two-stage portfolio in the worst-case scenario. Unlike two-stage robust optimization, the distributionally robust optimization framework effectively avoids “over-conservatism,” achieving a more balanced result in practical applications. It is particularly suitable for situations where uncertainty is high or difficult to quantify directly. However, DR-TSPO has a higher computational complexity because it requires handling probability distributions, optimizing distributional uncertainty, and possibly estimating distribution parameters. These challenges necessitate the design of appropriate algorithms to address the distributional uncertainties.

According to hierarchical optimization and dual theory, we reformulate the DR-TSPO into a more manageable deterministic two-stage second-order cone programming.

Theorem 1.

The DR-TSPO is equivalent to solving a deterministic two-stage nonlinear constrained optimization problem.

\begin{matrix} max_{x, y, ω, ν, θ, ε} & μ^{⊤} y - θ, \end{matrix}

(30)

\begin{matrix} s . t . & \frac{1}{S} \sum_{s \in [S]} ω_{s} + ν ϵ - μ^{⊤} x + c \sum_{i \in [N]} |x_{i} - y_{i}| \leq θ, \end{matrix}

(31)

\begin{matrix} (\frac{1}{φ} - \frac{1}{ν}) \cdot ω_{s} \geq {(β \cdot μ^{⊤} (y - y_{m k t}) + ε_{s} \cdot 1_{S}^{⊤} μ - μ^{⊤} x)}^{2}, \forall s \in [S], \end{matrix}

(32)

\begin{matrix} \frac{1}{S - 1} \sum_{s \in [S]} ε_{s}^{2} = γ \cdot {∥ y - y_{m k t} ∥}_{2}^{2}, \end{matrix}

(33)

\begin{matrix} \sum_{i \in [N]} x_{i} = 1, \sum_{i \in [N]} y_{i} = 1, \sum_{s \in [S]} ε_{s} = 0, \end{matrix}

(34)

\begin{matrix} x_{i}, y_{i} \geq 0, \forall i \in [N], w_{s} \geq 0, 0 \leq φ \leq v . \end{matrix}

(35)

The proof is shown in Appendix B.

The DR-TSPO optimization problem is a multi-variable, multi-constraint non-convex optimization problem. The non-convexity primarily arises from the quadratic nonlinear term in constraint

(32)

and the complex coupling of variables (such as the nonlinear dependence on

\frac{1}{ν}

). Additionally, the variance equality constraint

(33)

for the scenario variable

ε

and the absolute value term

c \sum_{i \in [N]} | x_{i} - y_{i} |

further complicate the solution process. In practical applications, due to the large number of stocks N and scenarios S, the problem’s scale significantly increases, resulting in high computational costs. Therefore, it is necessary to design appropriate algorithms that reduce computational complexity by relaxing constraints, decomposing the problem, or introducing penalty terms, while ensuring feasibility and obtaining high-quality approximate solutions quickly.

5. Deep Learning-Based Constraint Correction Algorithm

To reduce the resource usage and time cost associated with solving non-convex constraints in large-scale optimization problems, we design a deep learning-based constraint correction algorithm (DL-CCA) for non-convex constrained optimization. Beyond the traditional optimization literature, substantial research efforts in deep learning have focused on developing approximations or acceleration techniques for optimization models. As evidenced by comprehensive reviews in fields like combinatorial optimization [32] and optimal power flow [33], current machine learning applications for optimization acceleration primarily follow two distinct methodologies.

One methodology, conceptually similar to surrogate modeling techniques [34], trains machine learning models to directly predict complete solutions from optimization inputs. Nevertheless, these methods frequently encounter challenges in generating solutions that simultaneously satisfy feasibility and near-optimality conditions. Alternatively, a second methodology integrates machine learning within optimization frameworks, either in conjunction with or embedded in the solution process. Examples include learning effective warm-start initializations [35,36] or employing predictive models to identify active constraints, thereby enabling constraint reduction strategies [37].

In this work, we consider solving a series of optimization problems where the objective or constraints differ across instances. Formally, let

z \in R^{m}

represent the solution to the corresponding optimization problem. For any given parameters, our goal is to find the optimal solution

z^{*}

for

z

:

\begin{matrix} min_{z \in R^{m}} f (z), s . t . g (z) \leq 0, h (z) = 0, \end{matrix}

(36)

Here, f, g, and h may be nonlinear and non-convex. We consider using deep learning methods to solve this task—specifically, training a neural network

N_{ϑ}

parameterized by

ϑ

to adjust the multi-dimensional random solution

\hat{z}

into an approximate optimal solution that satisfies the constraints

g (z)

and

h (z)

. This approach allows the integration of difficult non-convex constraints into the neural network training process. The method enables training directly from the problem specification (rather than a supervised dataset). Additionally, we incorporate equality constraint adjustment layers at both the input and output of the neural network model to adjust the form of partial solutions that satisfy the equality constraints. The algorithm learns to minimize a composite loss that includes the objective and two “soft loss” terms, which represent penalties for violating the equality and inequality constraints (

λ_{g}, λ_{h} ≫ 0

):

\begin{matrix} L_{soft} (N_{ϑ} (z)) = f (N_{ϑ} (z)) + λ_{g} \cdot ∥ ReLU (g (N_{ϑ} (z)) ∥_{2}^{2} + λ_{h} \cdot ∥ h (N_{ϑ} (z) ∥_{2}^{2} \end{matrix}

(37)

First, the training set is constructed by generating random solution samples

\hat{z}

with explicit equality constraint conditions. Specifically, we design an EqCompletion_Layer to handle the constraints, which includes normalizing the portfolio weights

x, y

and adjusting the mean of the loss sample

ε

to ensure that the specific equality constraint in Equation (33) is satisfied. At the same time, this layer optimizes the relationship between the constraints by adjusting factors, ensuring feasibility during the optimization process. During the solving process, our model needs to satisfy both inequality and equality constraints. To this end, the penalty term in the loss function includes constraints (31) (which relates to rebalancing costs, the second-stage portfolio returns, and other variables), constraint (32) (used to adjust portfolio loss reference point sample differences), and a series of non-negative constraints (35). These constraints are explicitly incorporated into the loss function penalty term using the L2 norm, which adjusts the variables to ensure that each solution is as close as possible to the feasible region.

We use a neural network model with fully connected structure, where the input layer size matches the dimensionality of the training data. The hidden layers employ ReLU activation functions, and the final output corresponds to the decision variables of the optimization problem. The optimization process utilizes the Adam optimizer in combination with an exponentially decaying learning rate scheduler to gradually reduce the learning rate, improving convergence and stability. The loss function includes penalty terms for multiple constraints, with the weight of each term adjusted using penalty factors (

λ_{g}, λ_{h} ≫ 0

) to ensure that the model can effectively balance the objective function and the constraints during training. Figure 2 illustrates the DL-CCA framework, and Algorithm 1 provides the corresponding pseudocode.

Algorithm 1: Deep learning-based constraint correction algorithm (DL-CCA)

1: Assume: Equality completion procedure

ψ (□) : □ \to \tilde{□}

to solve equality constraints, where,

□, \tilde{□} \in R^{m}

2: Initialize random sample solution:

\hat{z} = [\hat{x}, \hat{y}, \hat{ε}, \hat{ω}, \hat{ν}, \hat{θ}]; \hat{x}, \hat{y} \in R^{N}; \hat{ε}, \hat{ω} \in R^{S}; \hat{ν}, \hat{θ} \in R

.

3: Input: Training Set of solutions

\tilde{z} = [\tilde{x}, \tilde{y}, \tilde{ε}, \hat{ω}, \hat{ν}, \hat{θ}] = ψ (\hat{z})

, learning ratio (LR)

ι = ι_{0}

.

4: Initialize neural network

N_{ϑ} : R^{d \times m} \to R^{m}

5: for epoch = 1 to epochs do

6: Compute the output of the neural network layer

N_{ϑ} (\tilde{z})

7: Sample averaging layer processing,

Mean_Layer : R^{d \times m} \to R^{m}

,

\bar{z} = Mean (N_{ϑ} (\tilde{z}))

8: Equation constraint correction,

EqC_Layer : N_{ϑ} (z) = ψ (\bar{z})

9: Compute constraint-regularized loss:

L_{soft} (N_{ϑ} (z)) = f (N_{ϑ} (z)) + λ_{g} \cdot ∥ ReLU (g (N_{ϑ} (z)) ∥_{2}^{2} + λ_{h} \cdot ∥ h (N_{ϑ} {(z) ∥}_{2}^{2}

10: Update

ϑ

using

\nabla_{ϑ} l_{soft} (N_{ϑ} (z))

11: if epoch % 100 == 0 then

12: Update LR:

ι * = 0.9

13: end if

14: end for

15: Decoding the optimal solution

z^{*} = N_{ϑ}^{*} (\tilde{z})

16: return

x^{*}, y^{*}, ε^{*}, ω^{*}, ν^{*}, θ^{*}

6. Algorithm Experiments

6.1. Analysis of Optimal Network Parameters

We experimentally investigated the impact of three key parameters—neural network depth (num_Layer), hidden layer dimension (hidden_Size), and learning rate (learn_Rate)—on the average loss value (Avg.loss_Value) and average solution time (Avg.times) of the DL-CCA algorithm. A systematic analysis of the experimental results was conducted using heatmaps and three-way analysis of variance (ANOVA), as illustrated in Figure 3. The experiments addressed the DR-TSPO problem with a decision variable dimension of 200, and repeated trials were performed within the parameter ranges to obtain average values (

num_Layer \in {4, 5, 6}, hidden_Size \in {64, 128, 256, 512}, learn_Rate \in {0.0005, 0.001, 0.005}

).

6.2. Results of ANOVA

Through heatmap analysis, this study observed significant variations in model performance under different parameter combinations. In terms of solution accuracy, network complexity—particularly the hidden layer dimension (hidden_Size) and network depth (num_Layer)—emerged as critical factors. The experimental results demonstrated that when the hidden layer dimension was set to 256, which is close to the problem’s solution dimension, the model exhibited superior solution accuracy, as evidenced by a significant reduction in the average loss value across multiple trials. However, it is noteworthy that increasing network complexity, especially the hidden layer dimension, significantly prolonged the model’s solution time, potentially leading to higher computational costs when tackling large-scale optimization problems. Additionally, the learning rate setting played a crucial role in both the search accuracy for optimal solutions and computational efficiency. Experimental data indicated that a learning rate of approximately 0.001 yielded optimal performance.

Furthermore, through a 3-way ANOVA, we have statistically elucidated the significance of the individual parameters and their interaction effects on model performance. Initially, from the perspective of target loss as presented in Table 1, the learning rate (learn_Rate) exerts the most significant influence on the loss (F = 9.0533, p < 0.001), underscoring its pivotal role in determining model performance. The number of layers (num_Layer) also significantly affects the loss (F = 3.2942, p = 0.040), albeit with a relatively smaller effect size, suggesting that increasing the number of layers may optimize the loss to some extent, but the improvement is limited. In contrast, the size of the hidden layer (hidden_Size) does not significantly impact the loss (F = 0.0966, p = 0.962), indicating a weaker direct influence on model performance. Additionally, the interaction among the three factors is significant (p = 0.038), revealing that the combination of these parameters may exert complex nonlinear effects on the loss.

From the perspective of computational time, network complexity is the primary factor influencing the speed of solution. As shown in Table 2, the size of the hidden layer (hidden_Size) has the most significant impact on the solution time (F = 46815.854, p < 0.001), with an extremely high effect size, indicating that an increase in hidden layer size significantly escalates computational complexity. Furthermore, the number of hidden layers (num_Layer) also significantly affects the solution time (F = 822.7933, p < 0.001), although its effect size is slightly lower than that of the hidden layer size, suggesting that increasing the number of layers also adds to the computational burden, albeit to a lesser extent. Although the learning rate (learn_Rate) significantly influences the solution time as well (F = 55.8716, p < 0.001), its effect size is relatively low, indicating a limited impact on computational efficiency, with its setting primarily aimed at ensuring solution accuracy. A larger learning rate accelerates gradient descent but compromises solution precision, necessitating a judicious setting of the learning rate. Interaction analysis further reveals significant interactions between the number of layers and hidden layer size (p < 0.001), as well as between hidden layer size and learning rate (p < 0.001), with these parameter combinations further affecting solution time. Additionally, the three-way interaction among these factors also reaches a significant level (p = 0.029), further corroborating the intricate relationships among the parameters.

In summary, the learning rate is a pivotal parameter that ensures the accuracy of the solution, while the size and number of hidden layers significantly influence the computation time. The impact of the three key parameters (num_Layer, hidden_Size, and learn_Rate) on the algorithm’s application is not independent; their interactions are also crucial, particularly in terms of computation time, where the combined effects of multiple factors can substantially increase computational complexity. Therefore, in practical model optimization, it is essential to consider both the individual effects of each parameter and their interactions to achieve a balance between model performance and computational efficiency.

6.3. Comparison of Algorithms for Solving Large-Scale DR-TSPO

To validate the efficiency of the DL-CCA algorithm, which is based on deep neural networks, in solving large-scale complex constrained non-convex problems, we designed a series of comparative experiments. These included the Trust-Constr algorithm from the Scipy library, the heuristic Hippopotamus Optimization (HO) algorithm, and ablation studies on the neural network architectures embedded within the DL-CCA algorithm (fully connected neural networks, LSTM, and CNN). The experimental results demonstrate that the DL-CCA algorithm exhibits significant advantages in terms of solution accuracy, computation time, and constraint violation, with its performance benefits being particularly pronounced in high-dimensional problems.

Trust-Constr is a modern variant based on the trust-region method, which integrates interior-point and trust-region methods to construct an efficient algorithm for solving optimization problems with nonlinear constraints. It is capable of handling general nonlinear constraints while ensuring the feasibility of constraints at each iteration and stabilizing the optimization process through dynamic adjustment of the trust-region size [38]. The Hippopotamus Optimization Algorithm (HO) is a novel metaheuristic algorithm (intelligent optimization algorithm) inspired by the inherent behaviors of hippopotamuses. Research indicates that the HO algorithm outperforms the SSA algorithm on most functions [39]. According to real return data from S&P 500 constituent stocks, we evaluated the effectiveness of various algorithms in solving the non-convex optimization problem DR-TSPO. Specifically, as the scale of the problem increases (portfolio expansion), we assessed the solving efficiency of the DL-CCA algorithm compared to different algorithms and network architectures.

Speed: The time or number of iterations required for an algorithm to find the optimal solution when solving optimization problems with a large number of variables and constraints. Solution speed is influenced by several factors, including problem size, algorithm complexity, problem structure (such as sparsity or nonlinearity), and available hardware and computational resources.
Feasibility: Feasibility refers to whether the obtained solution satisfies all the given constraints. For constrained optimization problems, feasibility can be measured by how well the constraints are satisfied, particularly with respect to equality and inequality constraints. The average constraint violation is defined as:

$C o n s t r_v i o = \frac{\sum_{i}^{n_C o n s t r} max {B i a s_C o n s t r (i), 0}}{n_C o n s t r},$

where $n_C o n s t r$ is the number of constraints, and $B i a s_C o n s t r$ represents the constraint violation.
Optimality: This refers to whether the algorithm can converge within a finite number of iterations. The iteration limit is set to 2000, and the convergence condition is defined by a gradient tolerance of $G r a d . t o l \leq 10^{- 6}$ .

6.4. Result of Algorithm Comparison

Table 3 presents the results of solving the DR-TSPO problem with variable dimensions

Dim_Var = 50, 100, \dots, 500

. We compared the average constraint violation for equality/inequality constraints between the DL-CCA and Trust-Constr algorithms under the task of seeking the optimal objective value

Optimal_Obj

. Additionally, as the problem scale increased, we compared the total runtime required by both algorithms on test instances, assuming full parallelization.

First, in terms of solution accuracy (

Optimal_Obj (Min .)

), the DL-CCA algorithm achieved significantly lower optimal objective values across all dimensions compared to the HO and Trust-Constr algorithms. For instance, when

Dim_Var = 50

, the optimal objective value for DL-CCA was 0.0024, while those for Trust-Constr and HO were 0.0047 and 0.0720, respectively. When

Dim_Var = 500

, the optimal objective value for DL-CCA was 0.0030, compared to 0.0037 and 0.0134 for Trust-Constr and HO, respectively. This indicates that the DL-CCA algorithm is more effective in approximating the global optimal solution, particularly in high-dimensional problems, where its precision advantage becomes more pronounced.

In terms of computation time (

Sol_time (Sec .)

), the DL-CCA algorithm also outperformed the comparative algorithms. Specifically, when solving high-dimensional, large-scale optimization problems (e.g.,

Dim_Var \geq 300

), the experimental results demonstrate that DL-CCA achieves faster solution times. Even in low-dimensional problems, DL-CCA’s computation time was significantly lower than that of the HO algorithm. For example, when

Dim_Var = 50

, DL-CCA’s computation time was 12.25 s, compared to 11.22 s for HO, but DL-CCA’s solution accuracy was substantially higher. Furthermore, as the problem dimension increased, the computation time for exact algorithms rose sharply, while DL-CCA’s computation time grew more gradually, indicating its superior computational efficiency in high-dimensional problems.

Although the Trust-Constr algorithm exhibits slightly better performance in terms of inequality constraint violations, its equality constraint violations are comparable to those of DL-CCA, while the HO algorithm shows significantly higher equality constraint violations than DL-CCA. This indicates that the DL-CCA algorithm is better at satisfying constraint conditions during the optimization process, thereby ensuring solution feasibility. In the ablation experiments of the DL-CCA algorithm, we observed that the DL-CCA algorithm employing a fully connected neural network (FCNN) outperformed those using LSTM and CNN in both solution accuracy and computation time. Specifically, the average optimal objective value for DL-CCA-FCNN was 0.0029, compared to 0.0405 and 0.0279 for DL-CCA-LSTM and DL-CCA-CNN, respectively. Additionally, the average computation time for DL-CCA-FCNN was 76.78 s, significantly lower than the 224.38 s for DL-CCA-LSTM and 413.26 s for DL-CCA-CNN.

Figure 4 illustrates the trends in computation time and optimal values as the problem scale increases for various algorithms. The DL-CCA algorithm demonstrates significant advantages over the exact solver Trust-Constr and the heuristic algorithm HO in terms of solution accuracy and computation time when addressing large-scale complex constrained non-convex problems. Furthermore, the DL-CCA algorithm utilizing FCNN outperforms those employing LSTM and CNN, further validating the effectiveness of FCNN in handling high-dimensional nonlinear optimization problems. These experimental results robustly demonstrate the efficiency and robustness of the DL-CCA algorithm in practical applications.

7. DR-TSPO vs. TSPO Empirical Validation

To validate the advantages of the DR-TSPO model based on loss aversion, we design a comparative experiment to benchmark it against the traditional two-stage robust optimization model. The core of the experiment is to examine the model’s performance under different market distributions when the loss reference point is treated as a random variable. The experiment will optimize a set of real market data using both models and assess their robustness and loss aversion capabilities in practical decision making. By comparing the portfolio’s average drawdown, return, volatility, and risk-adjusted performance metrics, we aim to demonstrate the adaptability and advantages of the loss aversion-based DR-TSPO model in markets with unknown loss reference distributions.

7.1. Experimental Data and Evaluation Metrics

Between 2019 and 2020, global financial markets experienced significant volatility and high uncertainty. The market trend in 2019 was relatively stable, while the outbreak of the COVID-19 pandemic in 2020 triggered a global financial crisis, leading to sharp market fluctuations, a significant drop in stock prices, and a further intensification of the economic recession. Against this backdrop, the experiment can more effectively test the risk transmission of portfolios in both normal market conditions (2019) and extreme market scenarios (2020), thereby assessing their robustness and resilience during financial crises, as shown in Figure 5. To achieve this, we selected constituent stocks from eight global market indices with different distribution characteristics, including SSE50, Hang Seng Index (HSI), FTSE Index, French CAC40 Index (FCHI), German DAX Index (GADXI), Russian RTS Index (RTS), Nikkei 225 Index (N225), and Nasdaq 100 Index (NDX). These indices represent the economic conditions and market volatility of different regions globally, providing high distribution diversity and representativeness. Specific information and distribution characteristics are presented in Table 4.

The experiment uses data from 1 January 2019 to 31 December 2019 as the in-sample data for constructing the first-stage portfolio; the second stage is based on 2020 data to assess the portfolio’s out-of-sample performance. The evaluation focuses on two main aspects. (1) First, the portfolio’s ability to avoid losses during the financial crisis (the global pandemic and economic recession in 2020) is assessed. Losses are characterized by comparing the average drawdown and the portfolio rebalancing magnitude in both stages. If the out-of-sample average drawdown decreases as a result, the model is considered to have better robustness and effectively reduces the average loss in the second stage. (2) Second, traditional metrics will be used to evaluate the portfolio’s return and volatility.

Annualized Return: Mean annualized portfolio return.
Standard deviation: Standard deviation of annualized portfolio return.
Maximum.Drawdown: Maximum portfolio drawdown.
Sharpe.Ratio: Sharpe Ratio represents the excess return per unit of total risk taken. $T R = \frac{E (R_{p}) - R_{f}}{σ_{p}}$ .
Beta: Beta represents the relationship between portfolio return volatility and market return volatility, which is a measure of systematic risk. $β = \frac{C o v (p, m)}{σ_{m}^{2}} = ρ_{p m} \frac{σ_{p}}{σ_{m}}$ .
Sortino Ratio: $SR = \frac{R_{p} - R_{f}}{\sqrt{\frac{1}{T} \sum_{i = 1}^{T} min {(0, R_{i} - R_{f})}^{2}}}$ . The risk-adjusted return of a portfolio after accounting for negative volatility.
Autocorrelation: A statistic used to measure the correlation between a variable in time series data and itself at different time points. $ρ_{k} = \frac{\sum_{t = k + 1}^{n} (R_{p t} - \bar{R_{p}}) (R_{p (t - k)} - \bar{R_{p}})}{\sum_{t = 1}^{n} {(R_{p t} - \bar{R_{p}})}^{2}}$ .
Rolling Returns: An indicator that calculates the return of a portfolio or asset over a moving window.
Mean Wealth: Average wealth in the second stage.

7.2. Results and Discussions

First, the experiment compares the mean drawdown and the two-stage rebalancing magnitude of the portfolio on out-of-sample data. The results in Figure 6 show significant regional differences under different market conditions (i.e., different distribution scenarios), especially among the EU countries represented by FTSE/FCHI/GDAXI and the U.S. and Japanese indices represented by N225/NDX. Specifically, compared to the TSPO model, the DR-TSPO model under the distributionally robust optimization framework demonstrates a clear advantage in robustness. Its out-of-sample average drawdown is consistently lower than that of the TSPO model and at least no worse.

In the Eurozone markets (such as the UK, France, and Germany), the advantages of DR-TSPO are not as pronounced, likely due to the high interconnectivity of market information and relatively clear market distributions (low uncertainty regarding market distributions for investors). In these markets, the loss aversion optimization model demonstrates better drawdown control than the market index, but the DR-TSPO model performs similarly to the standard TSPO in terms of ambiguity aversion and employs similar rebalancing strategies. In contrast, in other markets, the DR-TSPO model generally outperforms the non-robust model. Particularly in the Hong Kong and Russian stock markets (represented by HSI and RTS), the DR-TSPO model shows better adaptability through effective rebalancing compared to TSPO. For example, in the HSI index experiment in Hong Kong, the two-stage loss aversion model outperforms the market index, and the DR-TSPO further strengthens loss control. In the Russian market, due to its high volatility, although the TSPO model underperforms the market index, the DR-TSPO achieves better drawdown levels through larger rebalancing. While in markets such as China and the U.S., the two-stage loss aversion portfolio’s drawdown performance is inferior to the market index, the DR-TSPO shows better adaptability and robustness when facing unknown market distributions compared to the non-robust TSPO model.

In subsequent experiments, we further analyze the performance of the TSPO and DR-TSPO models across several major stock indices in 2020, aiming to compare their risk and return characteristics in market environments with different distributional features. In 2020, global markets faced extreme volatility triggered by the COVID-19 pandemic and sharp changes in monetary policies by governments, resulting in a significant increase in market uncertainty. Against this backdrop, the main objective of portfolio optimization models is to achieve higher returns while maintaining a conservative approach.

As shown in Table 5, the portfolio performance of the TSPO model and DR-TSPO model across different markets is presented.

On the SSE50 index, the DR-TSPO model achieves an annualized return of 0.4635, which is significantly higher than the TSPO’s 0.3020. Additionally, the DR-TSPO model has a lower volatility of 0.4430 compared to TSPO’s 0.5846, and it also performs better in terms of maximum drawdown, with DR-TSPO at 0.3405 versus TSPO’s 0.4528, indicating better capital protection. In 2020, the Chinese market experienced severe volatility during the early stages of the COVID-19 pandemic. The DR-TSPO model, using a distributionally robust optimization framework, was able to capture rebound opportunities under extreme market conditions while effectively reducing risk. This robust return characteristic highlights the advantage of DR-TSPO in highly volatile market environments. Especially during the pandemic and its aftermath, the traditional TSPO model may struggle to cope with such high market uncertainty and rapid changes, while the DR-TSPO effectively mitigates risk through its optimization strategy. Similarly, in the N225-based experiment, the DR-TSPO model shows an annualized return of 0.7257, slightly higher than TSPO’s 0.7189, with a volatility of 0.3119, compared to TSPO’s 0.4139, indicating better stability. In terms of maximum drawdown, DR-TSPO also outperforms TSPO, with 0.3533 versus 0.3830. In the experiments in the Chinese Shanghai and Japanese markets, the DR-TSPO model significantly outperforms the traditional two-stage loss aversion model in both volatility and profitability. The risk-adjusted Sharpe ratio and Sortino ratio further validate that the distributionally robust optimization method can maintain a certain level of robustness while overcoming over-conservatism, aiming for higher portfolio returns.

Moreover, the DR-TSPO model also performs well in the Eurozone markets, including the FTSE, FCHI, and GDAXI indices. Although the differences in various metrics are small, for example, the annualized return on the FTSE is 0.1641 for DR-TSPO, slightly higher than TSPO’s 0.1632, the volatility of DR-TSPO is significantly lower than that of TSPO. Specifically, the volatility for FTSE is 0.3460 for TSPO and 0.3444 for DR-TSPO, while for FCHI, the volatility is 0.3460 for DR-TSPO and 0.3466 for TSPO. For the German market (GDAXI), the annualized return for DR-TSPO is 1.1657, slightly higher than TSPO’s 1.1646, but the maximum drawdown is notably lower, with DR-TSPO at 0.2495 compared to TSPO’s 0.2496. The small differences can mainly be attributed to the influence of common monetary policies (such as European Central Bank regulation), close economic interconnections, synchronized impacts from the global economic environment, and similar industry structures and capital market linkages in the Eurozone markets. Due to the similar market distribution characteristics, the adaptive advantage of the robust optimization framework in addressing unknown distributions is not as pronounced. However, DR-TSPO excels in controlling volatility and maximum drawdown, demonstrating its advantage in more mature and volatile markets.

A particularly notable performance is observed in the RTS market, where the Russian economy faced multiple challenges such as a sharp drop in oil prices, the outbreak of the COVID-19 pandemic, and international sanctions, leading to extremely high financial market uncertainty. As a high-volatility market dominated by downward trends, this posed a significant test for the risk control capabilities of portfolio optimization methods. The experimental results in Table 5 indicate that TSPO performed relatively poorly, with an annualized return of −0.0905 compared to −0.1207 for DR-TSPO, volatility (0.3101 vs. 0.3759), and maximum drawdown (0.4093 vs. 0.5532), suggesting that investors could face substantial losses. In contrast, DR-TSPO, with its robust optimization model, effectively reduced volatility in such a high-risk environment, protecting investors from excessive losses. Even though all return-related metrics were negative, DR-TSPO’s ability to control volatility and manage risk remained a significant advantage, especially in a market characterized by high uncertainty and volatility. This allowed DR-TSPO to outperform traditional TSPO in avoiding extreme losses.

For the HSI (Hong Kong Hang Seng Index), although the annualized return of DR-TSPO (1.2015) is slightly lower than TSPO (1.7232), its volatility (0.4516) is significantly lower than TSPO’s (0.5479), and the maximum drawdown (0.3579 vs. 0.4087) also outperforms TSPO. Similarly, in the NDX index experiment, despite the limited profitability of DR-TSPO, it still demonstrates strong risk control, effectively smoothing market fluctuations and providing more stable returns. Looking at the wealth growth across different markets (as shown in Figure 7), the extreme declines in the RTS and N225 markets were the most severe, and the DR-TSPO model exhibited superior performance compared to the traditional TSPO model. Not only did it excel in volatility and maximum drawdown control, but it also helped avoid sudden losses from unknown factors, reflecting its conservative advantage. Overall, DR-TSPO, through its distribution-robust optimization framework, is better at balancing risk and return in high-uncertainty and high-volatility markets, offering better capital protection and delivering more robust investment performance.

In conclusion, the DR-TSPO model, through its extensive application and in-depth comparative testing across various international markets, has demonstrated its superior risk management and capital protection capabilities compared to the traditional TSPO model. This is especially true when faced with extreme market volatility, high uncertainty, and multiple external shocks such as the COVID-19 pandemic, oil price fluctuations, and international sanctions. With its advanced distribution-robust optimization framework, DR-TSPO excels in controlling volatility and maximum drawdown in markets ranging from China and Japan to the Eurozone and Russia. Even in markets where annualized returns do not show a clear advantage, its robust risk-adjusted performance stands out. This comprehensive and balanced investment strategy not only helps investors minimize potential losses in high-volatility environments but also supports more sustainable and steady wealth growth over the long term. Therefore, the DR-TSPO model provides a new and more reliable methodology for portfolio optimization, especially in the current global economic climate, which is increasingly complex and volatile. Its application value and practical significance are particularly prominent in this context.

8. Conclusions

Unlike existing studies that rely on predetermined loss reference points [4,5], our work integrates loss aversion theory with distributionally robust optimization to develop an adaptive loss reference point mechanism. This mechanism dynamically adjusts based on investors’ historical decisions and prevailing market conditions, thereby addressing the limitations of static reference point approaches. By explicitly capturing the path-dependent nature of loss aversion, our framework enhances decision-making flexibility across diverse market regimes. The proposed DR-TSPO model employs uncertainty sets to address distributional ambiguity, relaxing the conventional reliance on strict prior distribution assumptions [26,27]. This approach optimizes worst-case expected utility while maintaining robustness across different market regimes. The solution methodology demonstrates that the dual problem can be reformulated as a tractable second-order cone program. To enhance computational efficiency, we propose DL-CCA, a deep learning-based optimization algorithm that embeds constraint penalties within neural networks. Experimental comparisons demonstrate DL-CCA’s superior performance in solving large-scale optimization problems; it achieves an average optimal objective value of 0.0029 with merely 76.78 s of computation time, significantly outperforming traditional algorithms like Trust-Constr [38] and HO [39], as well as LSTM/CNN-based variants.

Comprehensive backtesting using global equity data reveals that DR-TSPO delivers stronger performance in volatile markets compared to traditional models. For instance, in China’s market, it achieves higher annualized returns (0.4635 vs. 0.3020) with lower volatility (0.4430 vs. 0.5846), demonstrating improved capital protection. The model particularly excels during extreme market conditions, such as the 2019-2020 period, where it provides more effective risk mitigation. The multidimensional implications of this research offer valuable insights for various financial market participants. For investors, the dynamic reference point mechanism optimizes decision-making processes and reduces irrational trading behaviors. Asset management institutions can leverage the DL-CCA algorithm’s efficiency to enable real-time portfolio rebalancing of complex strategies, thereby enhancing robo-advisory systems. Regulators may consider incorporating the model’s stress-testing performance into systemic risk monitoring frameworks. For policymakers, the findings support the development of algorithmic transparency standards and cross-border regulatory coordination to address emerging challenges in financial technology.

Future research directions could explore multi-asset extensions, macroeconomic factor integration, and reinforcement learning applications to further advance investment decision paradigms toward more intelligent and market-adaptive approaches.

Author Contributions

Conceptualization, X.Z. and S.L.; methodology, X.Z.; software, X.Z.; validation, S.L.; formal analysis, J.P.; investigation, S.L.; resources, X.Z.; data curation, X.Z. and S.L.; writing—original draft preparation, X.Z.; writing—review and editing, S.L.; visualization, X.Z.; supervision, S.L.; project administration, X.Z.; funding acquisition, S.L. and J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant 71771008 and 72271013.

Data Availability Statement

The original contributions presented in this study are included in the article.Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

DRO	Distributionally Robust Optimization
DR-TSPO	Distributionally Robust Two-Stage 82 Portfolio Optimization
TSPO	Two-Stage Portfolio Optimization
LPM	Lower Partial Moment
WCEL	Worst-case scenario Expected Loss
DL-CCA	Deep Learning-based Constraint Correction Algorithm
HO	Hippopotamus Optimization
FCNN	Fully Connected Neural Network
LSTM	Long Short-Term Memory
CNN	Convolutional Neural Networks
Trust-Constr	Trust-Region Constrained Optimization

Appendix A

Table A1. Notation.

$[N]$ :	The number of stocks available for configuration is represented by the total stock asset set $[N]$ , with the index of each individual stock j for all $j \in [N]$ .
$l, \hat{l}$ :	ℓ represents the random loss reference point for the loss-averse investor, which is related to previous returns and is used to measure whether the portfolio’s return aligns with the investor’s psychological expectations. $\hat{l}$ is the prior estimate of the random variable ℓ. The bold notation $\hat{l} = ({\hat{l}}_{1}, {\hat{l}}_{2}, \dots, {\hat{l}}_{S})$ , where $s \in [S]$ , represents the sample vector.
$μ_{\hat{l}}, σ_{\hat{l}}^{2}$ :	The expected value and variance of the generated sample reference points are given by $Σ_{\hat{l}} = σ_{\hat{l}}^{2} \cdot I_{s}$ , which is an invertible diagonal matrix, and $Σ_{\hat{l}} \in S_{+ +}$ , where $S_{+ +}$ denotes the set of positive definite matrices.
$ε$ :	The noise vector used for sample generation.
$[S]$ :	A set of scenarios (or the number of samples) for loss aversion references, obtained based on expert opinions or historical statistical datasets, and dependent on the portfolio decision $x$ .
$μ, Σ$ :	The expected return vector and covariance matrix of the stock’s random returns.
$μ_{\hat{l}}, σ_{\hat{l}}^{2}$ :	The sample expected value and variance of the loss aversion reference point sample $\hat{l}$ .
$β, γ > 0$ :	The adjustment coefficient, which scales the function’s range to a reasonable interval of values.
c:	The stock transaction cost coefficient.
$φ$ :	The investor’s loss aversion coefficient $φ > 0$ , which is an indicator used in behavioral economics to measure the investor’s sensitivity to losses compared to gains.
$y_{m k t}$ :	The market portfolio weights, serving as the reference decision targets for the investor.
$ϵ$ :	The given radius of the Wasserstein distance ball, which is used to adjust the size of the distributional ambiguity set.
Decision variable:
$x$ :	The continuous decision variable, $x = {(x_{1}, x_{2}, \dots, x_{n})}^{⊤}$ , represents the decision weight vector for the current portfolio, where $x_{j} \in [0, 1], \forall j \in [N]$ .
$y$ :	The previous decision vector $y = {(y_{1}, y_{2}, \dots, y_{n})}^{⊤}$ is related to the realization of the current loss aversion reference point. Under the assumption of the robust ambiguity set, the true distribution of the random loss reference point lies within the $ϵ$ -neighborhood of the scenario distribution based on the previous decision $y$ .
$ε$ :	The noise vector of the unknown distribution, $ε = {(ε_{1}, ε_{2}, \dots, ε_{s})}^{⊤}$ , whose distribution moments are related to the previous decision.

Bold symbols represent vectors.

Appendix B. Proof of Theorem 1

We adopt a hierarchical optimization approach, first solving the subproblem of the expected loss in the worst-case scenario (Worst-case scenario Expected Loss, WCEL). Then, under the realization of any event decision

\hat{y}

, there is a reference distributional ambiguity set

l \sim P, P \in B ({\hat{P}}_{(y)})

, that is,

\begin{matrix} (WCEL) : sup_{Π (d l, \hat{l}) \geq 0} & \int_{L} φ \cdot \sum_{s \in [S]} {[l - μ^{⊤} x, 0]}_{+}^{2} Π (d l, {\hat{l}}_{s}) \end{matrix}

(A1)

\begin{matrix} s . t . & \int_{L} Π (d l, {\hat{l}}_{s}) = \frac{1}{S}, \forall s \in [S] \end{matrix}

(A2)

\begin{matrix} \int_{L} \sum_{s \in [S]} {∥ l - {\hat{l}}_{s} ∥}_{2}^{2} Π (d l, \hat{l}) \leq ϵ . \end{matrix}

(A3)

This is a second-order lower partial moment (LPM) optimization problem under an unknown distribution.

Π

represents the joint distribution between the true loss ℓ and the estimated loss

{\hat{l}}_{s}

. The constraint

(A 2)

indicates that the marginal distribution

Π (d l, {\hat{l}}_{s})

of the joint distribution

Π

must satisfy the normalization condition, ensuring that the total weight of each estimated sample

{\hat{l}}_{s}

is

\frac{1}{S}

. In the scenario set

{{\hat{l}}_{s}}_{s \in [S]}

, we use Dirac distributions to represent the sample locations of each scenario s. Each sample can be treated as corresponding to a Dirac distribution

δ_{{\hat{l}}_{s}}

, meaning that in each scenario s, the sample

{\hat{l}}_{s}

is deterministic, and in that scenario, all the probability mass is concentrated at this point. The constraint

(A 3)

controls that the Wasserstein distance between the true distribution and the reference distribution does not exceed the threshold

ϵ

.

To construct the Lagrangian function of the optimization problem and its corresponding dual problem, we first need to incorporate the constraints of the optimization problem into the objective function, then apply the method of Lagrange multipliers to solve it. By introducing the Lagrange multipliers

ω

and

ν > 0

for the constraints

(A 2)

and

(A 3)

, respectively, the Lagrangian function for the internal subproblem (WCEL) is expressed as:

\begin{matrix} L (Π, ω, ν) = & φ \cdot \int_{L} \sum_{s \in [S]} {[l - μ^{⊤} x, 0]}_{+}^{2} Π (d l, {\hat{l}}_{s}) - \int_{L} \sum_{s \in [S]} ω_{s} Π (d l, {\hat{l}}_{s}) \\ - \int_{L} \sum_{s \in [S]} ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2} Π (d l, {\hat{l}}_{s}) + \frac{1}{S} \sum_{s \in [S]} ω_{s} + ν ϵ . \end{matrix}

(A4)

Therefore, the Lagrangian dual function can be expressed as:

\begin{matrix} inf_{ω, ν} g (ω, ν) \\ = & inf_{ω, ν} sup_{Π (d l, \hat{l}) \geq 0} L (Π, ω, ν) \\ = & inf_{ω, ν} sup_{Π (d l, \hat{l}) \geq 0} \int_{L} \sum_{s \in [S]} ({[l - μ^{⊤} x, 0]}_{+}^{2} - ω_{s} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2}) Π (d l, {\hat{l}}_{s}) \\ + \frac{1}{S} \sum_{s \in [S]} ω_{s} + ν ϵ . \end{matrix}

Noting that when

φ \cdot {[l - μ^{⊤} x]}_{+}^{2} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2} > ω_{s}

, we can focus on adjusting

Π (d l, \hat{l}) \geq 0

to ensure that

L (Π, ω, ν) \to - \infty

. Therefore, it must hold that

φ \cdot {[l - μ^{⊤} x]}_{+}^{2} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2} \leq ω_{s}

. With this constraint, the minimization problem can be optimized when

Π (d l, \hat{l}) = 0

. The strong duality theorem holds, leading to the equivalent dual problem of WCEL (DWCEL).

\begin{matrix} (DWCEL) : min_{ω, ν} & \frac{1}{S} \sum_{s \in [S]} ω_{i} + ν ϵ \end{matrix}

(A5)

\begin{matrix} s . t . & φ \cdot {[l - μ^{⊤} x]}_{+}^{2} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2} \leq ω_{s}, \forall l \in L_{s}, s \in [S] \end{matrix}

(A6)

\begin{matrix} ν > 0 . \end{matrix}

(A7)

Let

h (l) = φ \cdot {[l - μ^{⊤} x]}_{+}^{2} - ω - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2}

. Next, we consider how to ensure that

h (l)

is always less than or equal to zero. This can be achieved by ensuring that the following constraint conditions are satisfied.

\begin{matrix} sup_{l \in L} h (l) = sup_{l \in L} (φ \cdot {[l - μ^{⊤} x, 0]}_{+}^{2} - ω_{s} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2}) \leq 0 \end{matrix}

(A8)

Assume that for

l \in L_{s}

, the range of values for ℓ can be divided into two cases:

l > μ^{⊤} x

and

l \leq μ^{⊤} x

, and we will discuss each case separately.

PART 1. When

l > μ^{⊤} x

, the loss aversion utility is quadratic, and we have:

h (l) = φ \cdot {(l - μ^{⊤} x)}^{2} - ω_{s} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2} .

Let

Δ u_{s} = l - {\hat{l}}_{s}, Δ u_{s} \in R

, then

\begin{matrix} sup_{l \in L} φ \cdot {(l - μ^{⊤} x)}^{2} - ω_{s} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2} \end{matrix}

(A9)

\begin{matrix} = & sup_{Δ u_{s} \in R} φ \cdot {(Δ u_{s} + [{\hat{l}}_{s} - μ^{⊤} x])}^{2} - ω_{s} - ν \cdot {∥ Δ u_{s} ∥}_{2}^{2} \end{matrix}

(A10)

\begin{matrix} = & sup_{t \geq 0} sup_{∥ Δ u_{s} ∥_{2}^{2} = t} φ \cdot [Δ u_{s}^{2} + {({\hat{l}}_{s} - μ^{⊤} x)}^{2} + 2 ({\hat{l}}_{s} - μ^{⊤} x) Δ u_{s}] - ω_{s} - ν \cdot t \end{matrix}

(A11)

\begin{matrix} = & sup_{∥ Δ u_{s} ∥_{2}^{2} \geq 0} \{(φ - ν) Δ u_{s}^{2} + 2 φ ({\hat{l}}_{s} - μ^{⊤} x) Δ u_{s}\} + φ {({\hat{l}}_{s} - μ^{⊤} x)}^{2} - ω_{s} \end{matrix}

(A12)

where the term

φ {({\hat{l}}_{s} - μ^{⊤} x)}^{2} - ω_{s}

can be treated as a constant that does not change with

Δ u_{s}

. If

φ - ν \geq 0

; this function is either linear or convex, and its maximum value occurs at the boundary points. Since the domain of

Δ u_{s}

is the positive real number domain

R

, the value tends to infinity as

Δ u_{s} \to \pm \infty

. However, when

φ - ν < 0

, the function is concave, and its maximum value can be found by solving for the point where the gradient is zero. Let:

\nabla_{Δ u_{s}} = 2 (φ - ν) Δ u_{s} + 2 φ ({\hat{l}}_{s} - μ^{⊤} x) = 0 .

We have

Δ u_{s}^{*} = - \frac{φ ({\hat{l}}_{s} - μ^{⊤} x)}{φ - ν} .

Substituting

Δ u_{s}^{*}

back into equation (A12), when

φ - ν > 0

, the upper bound of

{sup}_{l \in L} h (l)

is:

\begin{matrix} sup_{∥ Δ u_{s} ∥_{2}^{2} \geq 0} \{(φ - ν) Δ u_{s}^{2} + 2 φ ({\hat{l}}_{s} - μ^{⊤} x) Δ u_{s}\} + φ {({\hat{l}}_{s} - μ^{⊤} x)}^{2} - ω_{s} \\ = & (φ - ν) {(- \frac{φ ({\hat{l}}_{s} - μ^{⊤} x)}{φ - ν})}^{2} + 2 φ ({\hat{l}}_{s} - μ^{⊤} x) (- \frac{φ ({\hat{l}}_{s} - μ^{⊤} x)}{φ - ν}) + φ {({\hat{l}}_{s} - μ^{⊤} x)}^{2} - ω_{s} \\ = & - \frac{φ^{2} {({\hat{l}}_{s} - μ^{⊤} x)}^{2}}{φ - ν} + φ {({\hat{l}}_{s} - μ^{⊤} x)}^{2} - ω_{s} . \end{matrix}

Therefore, the constraint

(A 8)

is equivalent to:

\{\begin{matrix} ω_{s} \geq \frac{φ^{2} {({\hat{l}}_{s} - μ^{⊤} x)}^{2}}{(ν - φ)} + φ {({\hat{l}}_{s} - μ^{⊤} x)}^{2} & , φ - ν \leq 0 \\ + \infty & , φ - ν > 0 \end{matrix}

PART 2. If

l \leq μ^{⊤} x

, then

{[l - μ^{⊤} x, 0]}_{+}^{2} = 0

, meaning that no loss aversion emotion is generated in this case.

sup_{l \in L} h (l) = sup_{l \in L} - ω_{s} - ν \cdot {∥ l - {\hat{l}}_{s} ∥}_{2}^{2}

Since

ν \geq 0

, the constraint

(A 8)

is equivalent to

ω_{s} \geq 0

. In conclusion, the DWCEL problem is equivalent to:

\begin{matrix} min_{ω, ν} & \frac{1}{S} \sum_{s \in [S]} ω_{s} + ν ϵ \end{matrix}

(A13)

\begin{matrix} s . t . & ω_{s} \geq \frac{φ^{2} {({\hat{l}}_{s} - μ^{⊤} x)}^{2}}{(ν - φ)} + φ {({\hat{l}}_{s} - μ^{⊤} x)}^{2}, s \in [S] \end{matrix}

(A14)

\begin{matrix} 0 \leq φ \leq ν, ω_{s} \geq 0, \forall s \in [S] . \end{matrix}

(A15)

By substituting the dual problem of the subproblem (DWCEL) into the original problem, the DR-TSPO model is equivalent to solving:

\begin{matrix} max_{y \in X, \hat{l} \in \hat{L} (y), ε \in R^{S}} & μ^{⊤} y - min_{x \in X, ω, ν} \frac{1}{S} \sum_{s \in [S]} ω_{s} + ν ϵ - μ^{⊤} x + c \sum_{i \in [N]} |x_{i} - y_{i}| \end{matrix}

(A16)

\begin{matrix} s . t . & (\frac{1}{φ} - \frac{1}{ν}) \cdot ω_{s} \geq {({\hat{l}}_{s} - μ^{⊤} x)}^{2}, \forall \hat{l} \in {\hat{L}}_{s}, s \in [S] \end{matrix}

(A17)

\begin{matrix} {\hat{l}}_{s} = β \cdot μ^{⊤} (y - y_{m k t}) + ε_{s} \cdot 1_{N}^{⊤} μ, \forall s \in [S] \end{matrix}

(A18)

\begin{matrix} \frac{1}{S - 1} \sum_{s \in [S]} ε_{s}^{2} = γ \cdot {∥ y - y_{m k t} ∥}_{2}^{2} \end{matrix}

(A19)

\begin{matrix} \sum_{i \in [N]} x_{i} = 1, \sum_{i \in [N]} y_{i} = 1, \sum_{s \in [S]} ε_{s} = 0, \end{matrix}

(A20)

\begin{matrix} x_{i}, y_{i} \geq 0, \forall i \in [N], ω_{s} \geq 0, 0 \leq φ \leq ν . \end{matrix}

(A21)

By substituting constraint

(A 18)

into

(A 17)

and introducing the upper bound parameter

θ

for further simplification, we get:

\begin{matrix} max_{x, y, ω, ν, θ, ε} & μ^{⊤} y - θ \end{matrix}

(A22)

\begin{matrix} s . t . & \frac{1}{S} \sum_{s \in [S]} ω_{s} + ν ϵ - μ^{⊤} x + c \sum_{i \in [N]} |x_{i} - y_{i}| \leq θ \end{matrix}

(A23)

\begin{matrix} (\frac{1}{φ} - \frac{1}{ν}) \cdot ω_{s} \geq {(β \cdot μ^{⊤} (y - y_{m k t}) + ε_{s} \cdot 1_{S}^{⊤} μ - μ^{⊤} x)}^{2}, \forall s \in [S] \end{matrix}

(A24)

\begin{matrix} \frac{1}{S - 1} \sum_{s \in [S]} ε_{s}^{2} = γ \cdot {∥ y - y_{m k t} ∥}_{2}^{2} \end{matrix}

(A25)

\begin{matrix} \sum_{i \in [N]} x_{i} = 1, \sum_{i \in [N]} y_{i} = 1, \sum_{s \in [S]} ε_{s} = 0, \end{matrix}

(A26)

\begin{matrix} x_{i}, y_{i} \geq 0, \forall i \in [N], ω_{s} \geq 0, 0 \leq φ \leq ν . \end{matrix}

(A27)

References

Kahneman, D.; Tversky, A. Prospect Theory: An Analysis of Decision under Risk. Econometrica 1979, 47, 263–291. [Google Scholar] [CrossRef]
Barberis, N.; Huang, M.; Santos, T. Prospect theory and asset prices. Q. J. Econ. 2001, 116, 1–53. [Google Scholar] [CrossRef]
Zhang, W.; Semmler, W. Prospect theory for stock markets: Empirical evidence with time-series data. J. Econ. Behav. Organ. 2009, 72, 835–849. [Google Scholar] [CrossRef]
Barberis, N.; Xiong, W. What drives the disposition effect? An analysis of a long-standing preference-based explanation. J. Financ. 2009, 64, 751–784. [Google Scholar] [CrossRef]
Shi, Y.; Cui, X.; Li, D. Discrete-time behavioral portfolio selection under cumulative prospect theory. J. Econ. Dyn. Control 2015, 61, 283–302. [Google Scholar] [CrossRef]
Van Bilsen, S.; Laeven, R.J.; Nijman, T.E. Consumption and portfolio choice under loss aversion and endogenous updating of the reference level. Manag. Sci. 2020, 66, 3927–3955. [Google Scholar] [CrossRef]
Gao, J.; Li, Y.; Shi, Y.; Xie, J. Multi-period portfolio choice under loss aversion with dynamic reference point in serially correlated market. Omega 2024, 127, 103103. [Google Scholar] [CrossRef]
Berkelaar, A.B.; Kouwenberg, R.; Post, T. Optimal portfolio choice under loss aversion. Rev. Econ. Stat. 2004, 86, 973–987. [Google Scholar] [CrossRef]
Jin, H.; Yu Zhou, X. Behavioral portfolio selection in continuous time. Math. Financ. Int. J. Math. Stat. Financ. Econ. 2008, 18, 385–426. [Google Scholar]
He, X.D.; Zhou, X.Y. Myopic loss aversion, reference point, and money illusion. Quant. Financ. 2014, 14, 1541–1554. [Google Scholar] [CrossRef]
He, X.D.; Zhou, X.Y. Portfolio choice under cumulative prospect theory: An analytical treatment. Manag. Sci. 2011, 57, 315–331. [Google Scholar] [CrossRef]
De Giorgi, E.G.; Legg, S. Dynamic portfolio choice and asset pricing with narrow framing and probability weighting. J. Econ. Dyn. Control 2012, 36, 951–972. [Google Scholar] [CrossRef]
Shi, Y.; Cui, X.; Yao, J.; Li, D. Dynamic trading with reference point adaptation and loss aversion. Oper. Res. 2015, 63, 789–806. [Google Scholar] [CrossRef]
Zou, B.; Zagst, R. Optimal investment with transaction costs under cumulative prospect theory in discrete time. Math. Financ. Econ. 2017, 11, 393–421. [Google Scholar] [CrossRef]
Zou, Y.; Guo, J. Two important improvements of prospect theory—Study of the loss aversion coefficient λ and the reference point. Oper. Res. Manag. Sci. 2007, 16, 87–89. [Google Scholar]
Markowitz, H. Portfolio Selection. J. Financ. 1952, 7, 71–91. [Google Scholar]
Baucells, M.; Weber, M.; Welfens, F. Reference-point formation and updating. Manag. Sci. 2011, 57, 506–519. [Google Scholar] [CrossRef]
Strub, M.S.; Li, D. Failing to foresee the updating of the reference point leads to time-inconsistent investment. Oper. Res. 2020, 68, 199–213. [Google Scholar] [CrossRef]
van Bilsen, S.; Laeven, R.J. Dynamic consumption and portfolio choice under prospect theory. Insur. Math. Econ. 2020, 91, 224–237. [Google Scholar] [CrossRef]
He, X.D.; Strub, M.S. How endogenization of the reference point affects loss aversion: A study of portfolio selection. Oper. Res. 2022, 70, 3035–3053. [Google Scholar] [CrossRef]
Chow, V.T.F.; Cui, Z.; Long, D.Z. Target-Oriented Distributionally Robust Optimization and Its Applications to Surgery Allocation. INFORMS J. Comput. 2022, 34, 2058–2072. [Google Scholar] [CrossRef]
Noyan, N.; Rudolf, G.; Lejeune, M. Distributionally robust optimization under a decision-dependent ambiguity set with applications to machine scheduling and humanitarian logistics. INFORMS J. Comput. 2022, 34, 729–751. [Google Scholar] [CrossRef]
Zhao, Y.; Chen, Z.; Zhang, Z. Distributionally Robust Chance-Constrained p-Hub Center Problem. INFORMS J. Comput. 2023, 35, 1361–1382. [Google Scholar] [CrossRef]
Garlappi, L.; Uppal, R.; Wang, T. Portfolio selection with parameter and model uncertainty: A multi-prior approach. Rev. Financ. Stud. 2007, 20, 41–81. [Google Scholar] [CrossRef]
Zhu, S.; Fukushima, M. Worst-case conditional value-at-risk with application to robust portfolio management. Oper. Res. 2009, 57, 1155–1168. [Google Scholar] [CrossRef]
Sun, Y.; Aw, E.L.G.; Li, B.; Teo, K.L.; Sun, J. CVaR-based robust models for portfolio selection. J. Ind. Manag. Optim. 2020, 16, 1861–1871. [Google Scholar] [CrossRef]
Kang, Z.; Li, X.; Li, Z.; Zhu, S. Data-driven robust mean-CVaR portfolio selection under distribution ambiguity. Quant. Financ. 2019, 19, 105–121. [Google Scholar] [CrossRef]
Pflug, G.; Wozabal, D. Ambiguity in portfolio selection. Quant. Financ. 2007, 7, 435–442. [Google Scholar] [CrossRef]
Wozabal, D. A framework for optimization under ambiguity. Ann. Oper. Res. 2012, 193, 21–47. [Google Scholar] [CrossRef]
Gao, R.; Chen, X.; Kleywegt, A.J. Wasserstein distributionally robust optimization and variation regularization. Oper. Res. 2024, 72, 1177–1191. [Google Scholar] [CrossRef]
Blanchet, J.; Chen, L.; Zhou, X.Y. Distributionally robust mean-variance portfolio selection with Wasserstein distances. Manag. Sci. 2022, 68, 6382–6410. [Google Scholar] [CrossRef]
Bengio, Y.; Lodi, A.; Prouvost, A. Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 2021, 290, 405–421. [Google Scholar] [CrossRef]
Hasan, F.; Kargarian, A.; Mohammadi, A. A survey on applications of machine learning for optimal power flow. In Proceedings of the 2020 IEEE Texas Power and Energy Conference (TPEC), College Station, TX, USA, 6–7 February 2020; pp. 1–6. [Google Scholar]
Koziel, S.; Leifsson, L. Surrogate-Based Modeling and Optimization; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Baker, K. Learning warm-start points for AC optimal power flow. In Proceedings of the 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), Pittsburgh, PA, USA, 13–16 October 2019; pp. 1–6. [Google Scholar]
Dong, W.; Xie, Z.; Kestor, G.; Li, D. Smart-PGSim: Using neural network to accelerate AC-OPF power grid simulation. In Proceedings of the SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 9–19 November 2020; pp. 1–15. [Google Scholar]
Misra, S.; Roald, L.; Ng, Y. Learning for constrained optimization: Identifying optimal active constraint sets. INFORMS J. Comput. 2022, 34, 463–480. [Google Scholar] [CrossRef]
Wächter, A.; Biegler, L.T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]
Amiri, M.H.; Mehrabi Hashjin, N.; Montazeri, M.; Mirjalili, S.; Khodadadi, N. Hippopotamus optimization algorithm: A novel nature-inspired optimization algorithm. Sci. Rep. 2024, 14, 5032. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Decision-dependent loss aversion.

Figure 2. DL-CCA.

Figure 3. DCLLA network parameter analysis heat map. Darker colors indicate lower optimal values and faster solving times.

Figure 4. Comparison of Algorithms: (a) The algorithm solution time increases with the problem scale. (b) The optimal value of the algorithm increases with the scale of the problem.

Figure 5. 1 January 2019 to 31 December 2020 Stock Index Price.

Figure 6. Comparison of mean drawdown and rebalancing.

Figure 7. Cumulative wealth growth.

Table 1. Three-way ANOVA table for loss.

Source	Sum_Sq	df	F	PR (>F)
C(num_Layer)	$7.64 \times 10^{5}$	2.0	3.2942	0.040
C(hidden_Size)	$3.36 \times 10^{4}$	3.0	0.0966	0.962
C(learn_Rate)	$2.10 \times 10^{6}$	2.0	9.0533	0.000
C(num_Layer):C(hidden_Size)	$1.50 \times 10^{6}$	6.0	2.1559	0.051
C(num_Layer):C(learn_Rate)	$5.01 \times 10^{5}$	4.0	1.0797	0.369
C(hidden_Size):C(learn_Rate)	$6.02 \times 10^{5}$	6.0	0.8650	0.522
C(num_Layer):C(hidden_Size):C(learn_Rate)	$2.65 \times 10^{6}$	12.0	1.9037	0.038
Residual	$1.67 \times 10^{7}$	144.0	–	–

Table 2. Three-way ANOVA table for times.

Source	Sum_Sq	df	F	PR (>F)
C(num_Layer)	$1.29 \times 10^{4}$	2.0	822.7933	0.000
C(hidden_Size)	$1.10 \times 10^{6}$	3.0	46815.854	0.000
C(learn_Rate)	$8.74 \times 10^{2}$	2.0	55.8716	0.000
C(num_Layer):C(hidden_Size)	$1.83 \times 10^{4}$	6.0	389.5377	0.000
C(num_Layer):C(learn_Rate)	$5.57 \times 10^{1}$	4.0	1.7811	0.136
C(hidden_Size):C(learn_Rate)	$6.69 \times 10^{2}$	6.0	14.2643	0.000
C(num_Layer):C(hidden_Size):C(learn_Rate)	$1.86 \times 10^{2}$	12.0	1.9867	0.029
Residual	$1.13 \times 10^{3}$	144.0	–	–

Table 3. Comparison of algorithms.

Algorithm	Dim_Var	Iter	Constraint Violation		Sol_time (Sec.)	Opt_Obj (Min.)
Algorithm	Dim_Var	Iter	IEqConstr_vio	EqConstr_vio	Sol_time (Sec.)	Opt_Obj (Min.)
Trust-Constr	50	60	0	$2.11 \times 10^{- 7}$	* 1.67	0.0047
	100	299	0	$1.07 \times 10^{- 7}$	* 16.60	* 0.0038
	150	128	0	$1.42 \times 10^{- 7}$	* 13.75	0.0041
	200	310	0	$8.46 \times 10^{- 8}$	* 50.34	0.0036
	250	64	0	$4.94 \times 10^{- 8}$	* 20.71	0.0036
	300	217	0	$2.20 \times 10^{- 7}$	107.20	0.0043
	350	321	0	$1.27 \times 10^{- 7}$	211.05	0.0035
	400	355	0	$9.40 \times 10^{- 8}$	309.54	0.0043
	450	312	0	$2.28 \times 10^{- 7}$	480.63	* 0.0011
	500	367	0	$1.78 \times 10^{- 7}$	699.49	0.0037
	Avg.		* 0	* $1.44 \times 10^{- 7}$	191.10	0.0037
HO	50	200	$6.09 \times 10^{- 6}$	$3.18 \times 10^{- 1}$	11.22	0.0720
	100	400	0	$3.30 \times 10^{- 1}$	26.97	0.0700
	150	600	$8.24 \times 10^{- 7}$	$8.25 \times 10^{- 2}$	47.10	0.0173
	200	800	$9.29 \times 10^{- 6}$	$4.69 \times 10^{- 1}$	77.05	0.0273
	250	1000	$4.29 \times 10^{- 6}$	$4.32 \times 10^{- 1}$	107.98	0.0485
	300	1200	$2.78 \times 10^{- 6}$	$1.34 \times 10^{- 1}$	148.43	0.0114
	350	1400	$4.97 \times 10^{- 6}$	$2.83 \times 10^{- 1}$	191.57	0.0049
	400	1600	$1.24 \times 10^{- 5}$	$1.01 \times 10^{- 1}$	260.53	0.0228
	450	1800	$1.89 \times 10^{- 6}$	$9.55 \times 10^{- 2}$	296.35	0.0043
	500	2000	$1.06 \times 10^{- 6}$	$4.84 \times 10^{- 2}$	385.91	0.0134
	Avg.		$4.36 \times 10^{- 6}$	$2.29 \times 10^{- 1}$	155.31	0.0292
DL-CCA	50	2000	$1.98 \times 10^{- 6}$	$4.72 \times 10^{- 6}$	12.25	* 0.0024
	100	2000	$6.15 \times 10^{- 6}$	$1.86 \times 10^{- 6}$	22.05	0.0072
	150	2000	$8.33 \times 10^{- 7}$	$8.46 \times 10^{- 6}$	27.87	* 0.0027
	200	2000	$9.08 \times 10^{- 6}$	$6.68 \times 10^{- 6}$	52.74	* 0.0029
	250	2000	$4.53 \times 10^{- 7}$	$3.47 \times 10^{- 7}$	62.47	* 0.0027
	300	2000	$1.13 \times 10^{- 8}$	$7.87 \times 10^{- 8}$	* 77.57	* 0.0026
	350	2000	$9.13 \times 10^{- 7}$	$2.36 \times 10^{- 6}$	* 80.61	* 0.0029
	400	2000	$3.63 \times 10^{- 7}$	$1.04 \times 10^{- 4}$	* 126.44	* 0.0009
	450	2000	$8.04 \times 10^{- 7}$	$1.97 \times 10^{- 6}$	* 141.79	0.0017
	500	2000	$2.31 \times 10^{- 8}$	$1.96 \times 10^{- 7}$	* 164.05	* 0.0030
	Avg.		$2.06 \times 10^{- 6}$	$1.30 \times 10^{- 5}$	* 76.78	* 0.0029
DLCC-LSTM	50	2000	$2.37 \times 10^{- 6}$	$2.00 \times 10^{- 5}$	57.14	0.0350
	100	2000	$1.98 \times 10^{- 6}$	$6.11 \times 10^{- 6}$	93.47	0.0392
	150	2000	$2.31 \times 10^{- 5}$	$4.56 \times 10^{- 5}$	121.64	0.0286
	200	2000	$5.38 \times 10^{- 4}$	$6.64 \times 10^{- 2}$	198.92	0.0111
	250	2000	$5.61 \times 10^{- 7}$	$2.08 \times 10^{- 5}$	275.13	0.0380
	300	2000	$6.65 \times 10^{- 7}$	$2.95 \times 10^{- 5}$	300.92	0.0576
	350	2000	$4.86 \times 10^{- 7}$	$7.01 \times 10^{- 5}$	309.55	0.0526
	400	2000	$3.26 \times 10^{- 5}$	$5.30 \times 10^{- 4}$	298.91	0.0652
	450	2000	$4.65 \times 10^{- 7}$	$1.47 \times 10^{- 5}$	295.58	0.0233
	500	2000	$8.87 \times 10^{- 8}$	$5.46 \times 10^{- 6}$	292.55	0.0547
	Avg.		$6.00 \times 10^{- 5}$	$6.71 \times 10^{- 3}$	224.38	0.0405
DLCC-CNN	50	2000	$2.05 \times 10^{- 6}$	$1.10 \times 10^{- 6}$	116.69	0.0095
	100	2000	$1.52 \times 10^{- 6}$	$4.70 \times 10^{- 6}$	211.00	* 0.0060
	150	2000	$8.60 \times 10^{- 7}$	$1.92 \times 10^{- 4}$	288.77	0.0114
	200	2000	$1.23 \times 10^{- 6}$	$4.40 \times 10^{- 5}$	286.63	0.0158
	250	2000	$2.80 \times 10^{- 6}$	$3.42 \times 10^{- 4}$	345.42	0.0219
	300	2000	$1.47 \times 10^{- 5}$	$2.94 \times 10^{- 4}$	253.91	0.0518
	350	2000	$4.89 \times 10^{- 7}$	$2.71 \times 10^{- 6}$	499.29	0.0446
	400	2000	$1.06 \times 10^{- 5}$	$9.82 \times 10^{- 5}$	640.41	0.0067
	450	2000	$3.32 \times 10^{- 7}$	$1.27 \times 10^{- 2}$	710.76	0.0501
	Avg.		$3.54 \times 10^{- 6}$	$1.39 \times 10^{- 3}$	413.26	0.0279

“*” indicates the optimal indicator.

Table 4. Distribution characteristics and moment information of different market indices.

	No	Region	Mean	StdDev	Skewness	Kurtosis	Min	Max	Range	Median
SSE50.GI	43	CHN.	0.0013	0.0121	0.4471	6.7585	−0.0476	0.0628	0.1104	0.0005
HSI.HI	58	CHN.	0.0005	0.0098	0.0831	4.1764	−0.0293	0.0388	0.0680	0.0008
N225.GI	222	JPN.	0.0008	0.0078	−0.1710	4.6227	−0.0303	0.0251	0.0554	0.0010
NDX.GI	88	US.	0.0013	0.0101	−0.4328	5.9353	−0.0360	0.0448	0.0808	0.0016
FTSE.GI	36	EU.	0.0007	0.0091	−0.1730	4.8791	−0.0357	0.0308	0.0665	0.0009
FCHI.GI	36	EU.	0.0010	0.0092	−0.6072	5.5408	−0.0363	0.0324	0.0687	0.0016
GDAXI.GI	32	EU.	0.0009	0.0095	−0.3146	5.0212	−0.0336	0.0389	0.0725	0.0020
RTS.GI	35	RUS.	0.0016	0.0096	−0.3046	4.5572	−0.0395	0.0287	0.0683	0.0013

Table 5. Performance metrics across different indices using TSPO and DR-TSPO.

	SSE50 *		HSI		N225 *		NDX
	TSPO	DR-TSPO	TSPO	DR-TSPO	TSPO	DR-TSPO	TSPO	DR-TSPO
Annualized Return	0.3020	0.4635	1.7232	1.2015	0.7189	0.7257	5.2504	2.5646
Std Dev	0.5846	0.4430	0.5479	0.4516	0.4139	0.3119	0.8182	0.5849
Max Drawdown	0.4528	0.3405	0.4087	0.3579	0.3830	0.3533	0.5187	0.4320
Sharpe Ratio	0.5166	1.0463	3.1453	2.6604	1.7368	2.3268	6.4174	4.3850
Beta	1.1594	1.5259	1.3258	1.2473	1.2584	1.0399	−0.2894	−0.3315
Sortino Ratio	0.8020	1.6079	5.1512	4.0696	2.5592	3.7678	8.5639	5.4795
Autocorrelation	0.1036	0.1036	−0.1152	−0.1207	−0.0417	0.0341	−0.0523	−0.1131
Rolling Returns	0.0009	0.0014	0.0039	0.0031	0.0022	0.0022	0.0077	0.0053
Mean Wealth	1.2321	0.9923	1.3306	1.2706	1.0768	1.0866	2.1383	1.7254
	FTSE *		FCHI *		GDAXI *		RTS *
	TSPO	DR-TSPO	TSPO	DR-TSPO	TSPO	DR-TSPO	TSPO	DR-TSPO
Annualized Return	0.1632	0.1641	0.1634	0.1657	1.1646	1.1657	−0.1207	−0.0905
Std Dev	0.3459	0.3444	0.3466	0.3460	0.4586	0.4579	0.3759	0.3101
Max Drawdown	0.3794	0.3792	0.3802	0.3802	0.2496	0.2495	0.5532	0.4093
Sharpe Ratio	0.4719	0.4739	0.4715	0.4775	2.5395	2.5401	−0.3210	−0.2917
Beta	1.0025	1.0035	1.0356	1.0363	0.3834	0.3828	0.6713	0.4427
Sortino Ratio	0.5748	0.5770	0.5741	0.5814	3.3276	3.3282	−0.3572	−0.3266
Autocorrelation	0.0219	0.0217	0.0223	0.0223	−0.0538	−0.0538	0.1902	0.0411
Rolling Returns	0.0006	0.0007	0.0006	0.0007	0.0032	0.0032	−0.0005	−0.0004
Mean Wealth	0.9154	0.9159	0.9152	0.9163	1.5821	1.5828	0.7768	0.8857

“*” indicates a market where DR-TSPO is performing better, and "―" indicates a better-performing indicator.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Liu, S.; Pan, J. Two-Stage Distributionally Robust Optimization for an Asymmetric Loss-Aversion Portfolio via Deep Learning. Symmetry 2025, 17, 1236. https://doi.org/10.3390/sym17081236

AMA Style

Zhang X, Liu S, Pan J. Two-Stage Distributionally Robust Optimization for an Asymmetric Loss-Aversion Portfolio via Deep Learning. Symmetry. 2025; 17(8):1236. https://doi.org/10.3390/sym17081236

Chicago/Turabian Style

Zhang, Xin, Shancun Liu, and Jingrui Pan. 2025. "Two-Stage Distributionally Robust Optimization for an Asymmetric Loss-Aversion Portfolio via Deep Learning" Symmetry 17, no. 8: 1236. https://doi.org/10.3390/sym17081236

APA Style

Zhang, X., Liu, S., & Pan, J. (2025). Two-Stage Distributionally Robust Optimization for an Asymmetric Loss-Aversion Portfolio via Deep Learning. Symmetry, 17(8), 1236. https://doi.org/10.3390/sym17081236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Stage Distributionally Robust Optimization for an Asymmetric Loss-Aversion Portfolio via Deep Learning

Abstract

1. Introduction

2. Related Work

2.1. Loss Aversion in Financial Investments

2.2. Distributionally Robust Portfolio Optimization

3. Basic Two-Stage Portfolio Optimization Model with Decision-Dependent Loss Aversion

4. DR-TSPO Model

5. Deep Learning-Based Constraint Correction Algorithm

6. Algorithm Experiments

6.1. Analysis of Optimal Network Parameters

6.2. Results of ANOVA

6.3. Comparison of Algorithms for Solving Large-Scale DR-TSPO

6.4. Result of Algorithm Comparison

7. DR-TSPO vs. TSPO Empirical Validation

7.1. Experimental Data and Evaluation Metrics

7.2. Results and Discussions

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B. Proof of Theorem 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI