1. Introduction
In recent years, the increasing frequency of geopolitical conflicts, global supply chain disruptions, and macroeconomic fluctuations have significantly heightened the uncertainty and complexity of financial markets. On the one hand, the complexity of investor behavior complicates the construction of quantitative portfolios. Moreover, market uncertainty severely impacts asset price stability and amplifies investors’ sensitivity to risk. Researchers have observed that during periods of intense market volatility, traditional portfolio theories often overlook the asymmetric sensitivity of investors to gains and losses. Kahneman and Tversky [
1] provided an alternative theoretical framework for decision making under such uncertainty. The decision model based on prospect theory incorporates new features, such as (i) reference dependence, where decision-makers evaluate outcomes as gains or losses relative to a reference point that fluctuates with market conditions and wealth [
2], (ii) asymmetric utility, as Zhang and Semmler [
3] demonstrated that previous gains and losses in the stock market have an asymmetric impact on investment behavior, and (iii) different probabilities for evaluating gains and losses. These features lead to highly nonlinear investor utility, posing significant challenges in modeling and solving portfolio optimization problems from this perspective.
In modeling loss aversion for portfolio optimization, the focus is on characterizing the updating mechanism of the loss reference point. The application of a static reference point, set subjectively by the decision-maker, is quite limited [
4,
5] and struggles to adapt to dynamic market conditions. A widely accepted view is that the loss reference point updates in an adaptive manner [
6,
7]. Our work builds on the literature of behavioral portfolio selection based on prospect theory, which includes notable studies such as [
4,
8,
9,
10,
11,
12,
13,
14], among others. However, we further extend this framework by incorporating a dynamic, decision-dependent reference point, and we model the uncertainty of the loss reference point through ambiguity sets in the distributionally robust optimization (DRO) framework. In recent years, DRO has been widely applied in uncertainty modeling, as it avoids unreasonable assumptions about the distributional form of random variables and provides robust solutions through worst-case analysis when precise distribution information is unavailable. Within the DRO framework, we propose an update mechanism for the decision-dependent loss aversion reference point, allowing the reference point to adapt dynamically to market performance and investor behavior. We assume that the distribution of the random loss reference point depends solely on the investor’s prior decisions and is independent of asset returns, without requiring additional commitments to the specific distribution form. Specifically, the difference between prior decision returns and market returns influences the expected value of the second-stage random loss reference point, while the difference in market weights and prior decisions affects the variance of the loss reference point, as illustrated in
Figure 1. When a loss occurs, investors compare their performance to that of the market or other investors, which in turn affects their risk preferences, decision making, and expectations for future returns. This “social comparison effect” stems from a common psychological mechanism in human social behavior, where individuals tend to assess their own performance by comparing it to others, something that particularly evident in financial decision making [
15].
We present the following three main contributions to the distributionally robust two-stage optimization portfolio (DR-TSPO) problem under loss aversion:
We propose an updated mechanism for the loss reference point based on prior decisions, which adapts to market fluctuations and investor behavior. This mechanism captures how investors dynamically respond to market changes. We also derive the equivalent dual of the DR-TSPO problem, transforming the original problem into a second-order cone programming problem that is easier to implement, providing a solid theoretical foundation for algorithm design and practical applications.
We develop a deep learning-based constraint correction algorithm (DL-CCA) to solve complex optimization problems with nonlinear and non-convex constraints. Specifically, the innovation of this method lies in training the neural network directly from the problem’s specifications, rather than from an existing supervised dataset, effectively handling complex non-convex constraints. Experimental results show that the DL-CCA algorithm, leveraging fully connected neural networks, outperforms Trust-Constr, HO, and LSTM/CNN-based variants in solving large-scale constrained non-convex problems, achieving superior average optimal objective values (0.0029) and faster solution times (76.78 s).
We validate the advantages of the loss aversion Distributionally Robust Two-Stage Portfolio Optimization (DR-TSPO) model in dealing with loss and uncertainty using global key stock index component data. The experimental results show that the DR-TSPO model exhibits strong robustness and lower drawdown under extreme market conditions (such as the 2020 COVID-19 pandemic). For instance, in the Chinese market, the DR-TSPO’s annual return is 0.4635, significantly higher than the TSPO’s 0.3020, with lower volatility (DR-TSPO: 0.4430 vs. TSPO: 0.5846), demonstrating stronger capital protection ability.
This study deeply combines behavioral finance theory with modern optimization methods. On the one hand, the decision-dependent loss reference point mechanism enriches the application scenarios of prospect theory in asset allocation. At the same time, the DL-CCA algorithm provides a universal solution framework for complex optimization problems in financial engineering, which has enlightening significance for technological upgrading in the fields of robo-advisory and risk management. In the future, it can be extended to more complex scenarios such as multi-stage investment decisions and cross-border asset allocation.
The remainder of the paper is organized as follows.
Section 2 reviews related work on loss aversion and distributionally robust optimization.
Section 3 introduces the two-stage portfolio optimization (TSPO) model with stochastic loss reference points.
Section 4 develops the loss aversion-based DR-TSPO model and derives its tractable reformulation. In
Section 5, we propose a DL-CCA algorithm for solving large-scale DR-TSPO problems.
Section 6 is the algorithm comparison experiment, including efficiency comparison and ablation study.
Section 7 designs empirical experiments using real data from global key index constituents and analyzes the results. The experimental findings demonstrate that, compared to conventional two-stage optimization models, the loss aversion-based DR-TSPO exhibits higher robustness and adaptability. Finally,
Section 8 concludes the paper and discusses future research directions.
3. Basic Two-Stage Portfolio Optimization Model with Decision-Dependent Loss Aversion
In the investment process, assume that the investor allocates all assets to the stock market, and the current portfolio weight vector is
, where
is a feasible portfolio weight vector. The stock return vector
follows a normal distribution
, so the portfolio return
can be expressed as:
The expected return and risk of the portfolio are, respectively, given by
Let denote the vector of expected returns of the stocks, and represent the covariance matrix of the returns.
For an investor with loss aversion preferences, assume that the loss aversion coefficient is
and there is a psychological reference point
ℓ (the loss reference point) to evaluate the portfolio’s gains and losses. When the portfolio return falls below the reference point
ℓ, the investor’s utility decreases. Specifically, the
-order loss aversion utility is defined as:
where
denotes the non-negative part (i.e., when a loss occurs, the investor experiences loss aversion; when the portfolio return exceeds the reference point, the loss aversion effect is zero). In practice, the assessment of losses is closely related to the prior investment decision
. Therefore, consider the loss reference point
ℓ as a random variable, where
.
Two-stage portfolio optimization is used in various financial and investment scenarios, aiming to enhance portfolio performance, reduce risk, and adapt to changing market conditions. It is applicable to optimization scenarios where decision making is dependent on random variables. Assume the decision set
is related to market random events, and the investor’s prior decisions reflect potential losses the investor may face under specific circumstances. In this framework, based on different random events
k, the decision-maker has a determined investment decision
and a loss reference point
ℓ. Consider the two-stage portfolio optimization problem (TSPO) under a quadratic loss utility function. Let the investor’s utility function be defined as
where
c is the transaction cost coefficient. Under the constraint of no short-selling, the two-stage portfolio optimization problem for a loss-averse investor is:
After the decision variable
is made in the first stage, the investor determines
ℓ based on the realized gains and losses. The variable
represents the decision variable in the second stage, which is made after the random loss reference point
ℓ is determined. Under the decision
, the realization of
ℓ can be determined by accessing a discrete probability distribution
, which contains
samples of loss reference points. This is defined as:
where:
- -
is the probability of the s-th sample, satisfying and , with the general assumption of equal probability.
- -
is the s-th loss reference point sample in the set determined by decision .
- -
is the Dirac delta function, which represents a probability mass at the point of the discrete distribution.
The discrete loss reference point samples can be obtained from historical data, expert knowledge, or by extracting reference distribution samples based on prior decision characteristics. This study primarily focuses on adaptive optimization methods, so the samples will be directly extracted based on the characteristics of the decision .
We assume that for a finite set of events k, there is a unique decision and a corresponding set of random loss reference point samples . The expected loss values of these distributions should be adjusted based on the nature of the events and the psychological expectations of the investors, ensuring that the expected values are ordered from largest to smallest, i.e., This structure ensures that the event partitions not only reflect the loss aversion sentiment of market participants but also provide a clear, stepwise basis for the expected loss of the reference points. For example, according to expert opinions and historical statistics, we classify random market states into five categories of events according to the varying levels of loss aversion among market participants. Four thresholds are set, which are used to define the event classification criteria. Specifically, we define the event set , where each event () corresponds to the following:
Event : High loss aversion, satisfying . Investors have suffered significant losses in the past or experienced a large gap in returns compared to the benchmark portfolio. This leads to high loss aversion, causing investors to set a lower reference loss for the new decision round. Such events are often accompanied by sharp market declines, where some investors may underestimate the market’s recovery potential, resulting in overly pessimistic expectations. Extreme outliers cause higher sample variance.
Event : Moderate-high loss aversion, satisfying . Investors may have experienced some losses, but the overall loss is smaller or the return difference with the benchmark portfolio is less significant, resulting in lower loss aversion. The sample variance is smaller.
Event : Moderate loss aversion, satisfying . Investors may have followed a benchmark-tracking strategy, with returns similar to or nearly identical to the benchmark, resulting in little additional loss.
Event : Moderate-low loss aversion, satisfying . Investors have achieved some excess returns compared to the benchmark portfolio and, due to a small deviation from market strategies, exhibit some degree of risk aversion. The new reference loss is positive but relatively low. The sample variance for the current market state is also low.
Event : Low loss aversion, satisfying . Investors have made significant portfolio adjustments or earned returns higher than the market. The new reference loss is high. Some investors may exhibit overconfidence, where overestimating their own abilities influences their decisions and expectations, causing outlier sample variance among those pursuing higher returns.
However, a finite set of random events is insufficient to explain the complex market states. When the threshold
T is infinitely subdivided (or the time intervals
are sufficiently small), market uncertainty is modeled as an infinite number of events (
) and an infinite number of feasible decisions
. By representing the sample set
as a continuous functional relationship of random events
, we introduce a mapping function
. This maps the decision of random events to the reference distribution information space of loss reference points
, and then generates discrete samples
, for
, according to the specific distribution moments. For each random event, the following relationship is set based on the reference distribution mean
and variance
:
where
and
serve as adjustment coefficients, while
represents the market portfolio weights. The term
reflects the investor’s prior gains or losses. When a loss occurs,
; otherwise, if there is no loss,
. The norm distance
measures the degree of deviation between the investor’s portfolio and the market portfolio, indicating the level of active management. In an efficient market, where information disseminates rapidly and prices adjust swiftly to all available information, investors tend to adopt passive investment strategies to track market indices. In this environment, active management struggles to generate consistent excess returns. Additionally, in low-volatility markets, investors are more inclined toward passive management
, aiming for relatively stable returns. Conversely, in an inefficient market, information asymmetry and the market’s failure to fully reflect fundamentals make active management strategies more attractive, as investors can exploit these inefficiencies to achieve excess returns. In high-volatility markets, where uncertainty and price fluctuations are significant, investors also prefer active management
to seize short-term investment opportunities.
Specifically, let the sample vector
be:
Here, “⊗” represents the tensor product of vectors, and
is a vector of size
N with all elements equal to 1. The noise vector
of dimension
S consists of independent and identically distributed (i.i.d.) elements that follow a multivariate distribution given by
, where
is the
S-dimensional identity matrix. The larger the prior loss, the higher the overall expected value of the sample. The greater the deviation of the decision from the market, i.e., the larger
, the higher the uncertainty regarding the loss reference, resulting in a larger overall variance. Define the sample set as:
According to the sample estimate, the TSPO can be approximated as a single-stage optimization problem.
By introducing auxiliary variables
, the problem becomes equivalent to a second-order cone programming (SOCP) formulation.
The definitions of the notations are presented in the
Appendix A.
4. DR-TSPO Model
In fact, the reference distribution may not accurately reflect the real situation, especially in scenarios with sparse data or noise. This bias can lead to suboptimal decisions in practical applications, increasing potential risks. To address the uncertainty in real-world distributions, two-stage distributionally robust optimization (DRO) adopts a more conservative approach. By constructing a distributional ambiguity set that encompasses all possible true distributions, the DRO optimization scheme targets the worst-case distribution for optimization. This “worst-case” approach effectively mitigates the impact of the reference distribution deviating from the true distribution, providing more robust decisions. Additionally, in high-uncertainty situations, it better balances risk and return, offering more reliable support for real-world decision making.
To account for the uncertainty of the loss reference point
ℓ, we introduce an ambiguity set based on the Wasserstein distance to characterize the distribution of the loss reference point. Specifically, the Wasserstein distance is used to measure the gap between the true distribution and the reference distribution. The uncertainty set
for the true distribution is defined as:
where the set
represents the collection of all Borel probability distributions on
, where
is a prescribed cone-shaped representable set.
is the reference distribution of the loss reference point determined by the prior decision
. Although this reference distribution is discrete, the ambiguity set can encompass both discrete and continuous distributions.
is the Wasserstein distance metric between two distributions, which measures the minimum cost required to transform one distribution into another, typically understood as the “transportation cost” in geographical space. The Type-2 Wasserstein distance, through its quadratic penalty mechanism, enables more refined control over higher-order distributional characteristics. This proves particularly advantageous when balancing mean–variance trade-offs, handling extreme events, or addressing high-dimensional correlations. However, its computational cost may be significantly higher, necessitating careful consideration of the optimal order selection based on specific problem requirements.
To construct a concrete optimization model and link the ambiguity set to the prior decision
, we use the ambiguity set based on the Wasserstein distance to capture the distributional uncertainty of
ℓ. The size of this ambiguity set can be adjusted by the parameter
, which represents the radius of the ambiguity sphere and controls the degree of uncertainty. The specific definition of the Wasserstein distance is:
where
denotes the set of all possible joint distributions, with marginal distributions
and
, respectively.
ℓ and
are samples drawn from these distributions. This metric ensures that the Wasserstein distance between any possible distribution
and the reference distribution
does not exceed
, thereby introducing uncertainty management.
When considering transaction costs, the investor’s objective is to minimize both the loss and some necessary costs. Specifically, the investor needs to consider not only the expected return
, but also loss aversion, ambiguity losses, and upper bounds on transaction costs. In this case, the investor’s objective function can be expressed as:
This objective function combines the prior decision, loss aversion preferences, and market transaction costs, aiming for effective asset allocation by maximizing the net returns of the two-stage portfolio under the worst-case scenario. Considering the two-stage distributionally robust portfolio optimization problem (DR-TSPO) under a quadratic loss utility function, the optimization problem can be expressed as:
Here,
c is the cost coefficient for buying and selling stocks. The utility model objective in Equation (
25) considers the possible true distribution of the loss reference point
ℓ based on the prior decision and aims to maximize the net returns of the two-stage portfolio in the worst-case scenario. Unlike two-stage robust optimization, the distributionally robust optimization framework effectively avoids “over-conservatism,” achieving a more balanced result in practical applications. It is particularly suitable for situations where uncertainty is high or difficult to quantify directly. However, DR-TSPO has a higher computational complexity because it requires handling probability distributions, optimizing distributional uncertainty, and possibly estimating distribution parameters. These challenges necessitate the design of appropriate algorithms to address the distributional uncertainties.
According to hierarchical optimization and dual theory, we reformulate the DR-TSPO into a more manageable deterministic two-stage second-order cone programming.
Theorem 1.
The DR-TSPO is equivalent to solving a deterministic two-stage nonlinear constrained optimization problem.The proof is shown in Appendix B. The DR-TSPO optimization problem is a multi-variable, multi-constraint non-convex optimization problem. The non-convexity primarily arises from the quadratic nonlinear term in constraint and the complex coupling of variables (such as the nonlinear dependence on ). Additionally, the variance equality constraint for the scenario variable and the absolute value term further complicate the solution process. In practical applications, due to the large number of stocks N and scenarios S, the problem’s scale significantly increases, resulting in high computational costs. Therefore, it is necessary to design appropriate algorithms that reduce computational complexity by relaxing constraints, decomposing the problem, or introducing penalty terms, while ensuring feasibility and obtaining high-quality approximate solutions quickly.
5. Deep Learning-Based Constraint Correction Algorithm
To reduce the resource usage and time cost associated with solving non-convex constraints in large-scale optimization problems, we design a deep learning-based constraint correction algorithm (DL-CCA) for non-convex constrained optimization. Beyond the traditional optimization literature, substantial research efforts in deep learning have focused on developing approximations or acceleration techniques for optimization models. As evidenced by comprehensive reviews in fields like combinatorial optimization [
32] and optimal power flow [
33], current machine learning applications for optimization acceleration primarily follow two distinct methodologies.
One methodology, conceptually similar to surrogate modeling techniques [
34], trains machine learning models to directly predict complete solutions from optimization inputs. Nevertheless, these methods frequently encounter challenges in generating solutions that simultaneously satisfy feasibility and near-optimality conditions. Alternatively, a second methodology integrates machine learning within optimization frameworks, either in conjunction with or embedded in the solution process. Examples include learning effective warm-start initializations [
35,
36] or employing predictive models to identify active constraints, thereby enabling constraint reduction strategies [
37].
In this work, we consider solving a series of optimization problems where the objective or constraints differ across instances. Formally, let
represent the solution to the corresponding optimization problem. For any given parameters, our goal is to find the optimal solution
for
:
Here,
f,
g, and
h may be nonlinear and non-convex. We consider using deep learning methods to solve this task—specifically, training a neural network
parameterized by
to adjust the multi-dimensional random solution
into an approximate optimal solution that satisfies the constraints
and
. This approach allows the integration of difficult non-convex constraints into the neural network training process. The method enables training directly from the problem specification (rather than a supervised dataset). Additionally, we incorporate equality constraint adjustment layers at both the input and output of the neural network model to adjust the form of partial solutions that satisfy the equality constraints. The algorithm learns to minimize a composite loss that includes the objective and two “soft loss” terms, which represent penalties for violating the equality and inequality constraints (
):
First, the training set is constructed by generating random solution samples with explicit equality constraint conditions. Specifically, we design an EqCompletion_Layer to handle the constraints, which includes normalizing the portfolio weights and adjusting the mean of the loss sample to ensure that the specific equality constraint in Equation (33) is satisfied. At the same time, this layer optimizes the relationship between the constraints by adjusting factors, ensuring feasibility during the optimization process. During the solving process, our model needs to satisfy both inequality and equality constraints. To this end, the penalty term in the loss function includes constraints (31) (which relates to rebalancing costs, the second-stage portfolio returns, and other variables), constraint (32) (used to adjust portfolio loss reference point sample differences), and a series of non-negative constraints (35). These constraints are explicitly incorporated into the loss function penalty term using the L2 norm, which adjusts the variables to ensure that each solution is as close as possible to the feasible region.
We use a neural network model with fully connected structure, where the input layer size matches the dimensionality of the training data. The hidden layers employ ReLU activation functions, and the final output corresponds to the decision variables of the optimization problem. The optimization process utilizes the Adam optimizer in combination with an exponentially decaying learning rate scheduler to gradually reduce the learning rate, improving convergence and stability. The loss function includes penalty terms for multiple constraints, with the weight of each term adjusted using penalty factors (
) to ensure that the model can effectively balance the objective function and the constraints during training.
Figure 2 illustrates the DL-CCA framework, and Algorithm 1 provides the corresponding pseudocode.
Algorithm 1: Deep learning-based constraint correction algorithm (DL-CCA) |
1: Assume: Equality completion procedure to solve equality constraints, where, |
2: Initialize random sample solution: . |
3: Input: Training Set of solutions , learning ratio (LR) . |
4: Initialize neural network |
5: for epoch = 1 to epochs do |
6: Compute the output of the neural network layer |
7: Sample averaging layer processing, |
, |
8: Equation constraint correction, |
|
9: Compute constraint-regularized loss: |
|
10: Update using |
11: if epoch % 100 == 0 then |
12: Update LR: |
13: end if |
14: end for |
15: Decoding the optimal solution |
16: return |
6. Algorithm Experiments
6.1. Analysis of Optimal Network Parameters
We experimentally investigated the impact of three key parameters—neural network depth (num_Layer), hidden layer dimension (hidden_Size), and learning rate (learn_Rate)—on the average loss value (Avg.loss_Value) and average solution time (Avg.times) of the DL-CCA algorithm. A systematic analysis of the experimental results was conducted using heatmaps and three-way analysis of variance (ANOVA), as illustrated in
Figure 3. The experiments addressed the DR-TSPO problem with a decision variable dimension of 200, and repeated trials were performed within the parameter ranges to obtain average values (
).
6.2. Results of ANOVA
Through heatmap analysis, this study observed significant variations in model performance under different parameter combinations. In terms of solution accuracy, network complexity—particularly the hidden layer dimension (hidden_Size) and network depth (num_Layer)—emerged as critical factors. The experimental results demonstrated that when the hidden layer dimension was set to 256, which is close to the problem’s solution dimension, the model exhibited superior solution accuracy, as evidenced by a significant reduction in the average loss value across multiple trials. However, it is noteworthy that increasing network complexity, especially the hidden layer dimension, significantly prolonged the model’s solution time, potentially leading to higher computational costs when tackling large-scale optimization problems. Additionally, the learning rate setting played a crucial role in both the search accuracy for optimal solutions and computational efficiency. Experimental data indicated that a learning rate of approximately 0.001 yielded optimal performance.
Furthermore, through a 3-way ANOVA, we have statistically elucidated the significance of the individual parameters and their interaction effects on model performance. Initially, from the perspective of target loss as presented in
Table 1, the learning rate (learn_Rate) exerts the most significant influence on the loss (F = 9.0533,
p < 0.001), underscoring its pivotal role in determining model performance. The number of layers (num_Layer) also significantly affects the loss (F = 3.2942,
p = 0.040), albeit with a relatively smaller effect size, suggesting that increasing the number of layers may optimize the loss to some extent, but the improvement is limited. In contrast, the size of the hidden layer (hidden_Size) does not significantly impact the loss (F = 0.0966,
p = 0.962), indicating a weaker direct influence on model performance. Additionally, the interaction among the three factors is significant (
p = 0.038), revealing that the combination of these parameters may exert complex nonlinear effects on the loss.
From the perspective of computational time, network complexity is the primary factor influencing the speed of solution. As shown in
Table 2, the size of the hidden layer (hidden_Size) has the most significant impact on the solution time (F = 46815.854,
p < 0.001), with an extremely high effect size, indicating that an increase in hidden layer size significantly escalates computational complexity. Furthermore, the number of hidden layers (num_Layer) also significantly affects the solution time (F = 822.7933,
p < 0.001), although its effect size is slightly lower than that of the hidden layer size, suggesting that increasing the number of layers also adds to the computational burden, albeit to a lesser extent. Although the learning rate (learn_Rate) significantly influences the solution time as well (F = 55.8716,
p < 0.001), its effect size is relatively low, indicating a limited impact on computational efficiency, with its setting primarily aimed at ensuring solution accuracy. A larger learning rate accelerates gradient descent but compromises solution precision, necessitating a judicious setting of the learning rate. Interaction analysis further reveals significant interactions between the number of layers and hidden layer size (
p < 0.001), as well as between hidden layer size and learning rate (
p < 0.001), with these parameter combinations further affecting solution time. Additionally, the three-way interaction among these factors also reaches a significant level (
p = 0.029), further corroborating the intricate relationships among the parameters.
In summary, the learning rate is a pivotal parameter that ensures the accuracy of the solution, while the size and number of hidden layers significantly influence the computation time. The impact of the three key parameters (num_Layer, hidden_Size, and learn_Rate) on the algorithm’s application is not independent; their interactions are also crucial, particularly in terms of computation time, where the combined effects of multiple factors can substantially increase computational complexity. Therefore, in practical model optimization, it is essential to consider both the individual effects of each parameter and their interactions to achieve a balance between model performance and computational efficiency.
6.3. Comparison of Algorithms for Solving Large-Scale DR-TSPO
To validate the efficiency of the DL-CCA algorithm, which is based on deep neural networks, in solving large-scale complex constrained non-convex problems, we designed a series of comparative experiments. These included the Trust-Constr algorithm from the Scipy library, the heuristic Hippopotamus Optimization (HO) algorithm, and ablation studies on the neural network architectures embedded within the DL-CCA algorithm (fully connected neural networks, LSTM, and CNN). The experimental results demonstrate that the DL-CCA algorithm exhibits significant advantages in terms of solution accuracy, computation time, and constraint violation, with its performance benefits being particularly pronounced in high-dimensional problems.
Trust-Constr is a modern variant based on the trust-region method, which integrates interior-point and trust-region methods to construct an efficient algorithm for solving optimization problems with nonlinear constraints. It is capable of handling general nonlinear constraints while ensuring the feasibility of constraints at each iteration and stabilizing the optimization process through dynamic adjustment of the trust-region size [
38]. The Hippopotamus Optimization Algorithm (HO) is a novel metaheuristic algorithm (intelligent optimization algorithm) inspired by the inherent behaviors of hippopotamuses. Research indicates that the HO algorithm outperforms the SSA algorithm on most functions [
39]. According to real return data from S&P 500 constituent stocks, we evaluated the effectiveness of various algorithms in solving the non-convex optimization problem DR-TSPO. Specifically, as the scale of the problem increases (portfolio expansion), we assessed the solving efficiency of the DL-CCA algorithm compared to different algorithms and network architectures.
Speed: The time or number of iterations required for an algorithm to find the optimal solution when solving optimization problems with a large number of variables and constraints. Solution speed is influenced by several factors, including problem size, algorithm complexity, problem structure (such as sparsity or nonlinearity), and available hardware and computational resources.
Feasibility: Feasibility refers to whether the obtained solution satisfies all the given constraints. For constrained optimization problems, feasibility can be measured by how well the constraints are satisfied, particularly with respect to equality and inequality constraints. The average constraint violation is defined as:
where
is the number of constraints, and
represents the constraint violation.
Optimality: This refers to whether the algorithm can converge within a finite number of iterations. The iteration limit is set to 2000, and the convergence condition is defined by a gradient tolerance of .
6.4. Result of Algorithm Comparison
Table 3 presents the results of solving the DR-TSPO problem with variable dimensions
. We compared the average constraint violation for equality/inequality constraints between the DL-CCA and Trust-Constr algorithms under the task of seeking the optimal objective value
. Additionally, as the problem scale increased, we compared the total runtime required by both algorithms on test instances, assuming full parallelization.
First, in terms of solution accuracy (), the DL-CCA algorithm achieved significantly lower optimal objective values across all dimensions compared to the HO and Trust-Constr algorithms. For instance, when , the optimal objective value for DL-CCA was 0.0024, while those for Trust-Constr and HO were 0.0047 and 0.0720, respectively. When , the optimal objective value for DL-CCA was 0.0030, compared to 0.0037 and 0.0134 for Trust-Constr and HO, respectively. This indicates that the DL-CCA algorithm is more effective in approximating the global optimal solution, particularly in high-dimensional problems, where its precision advantage becomes more pronounced.
In terms of computation time (), the DL-CCA algorithm also outperformed the comparative algorithms. Specifically, when solving high-dimensional, large-scale optimization problems (e.g., ), the experimental results demonstrate that DL-CCA achieves faster solution times. Even in low-dimensional problems, DL-CCA’s computation time was significantly lower than that of the HO algorithm. For example, when , DL-CCA’s computation time was 12.25 s, compared to 11.22 s for HO, but DL-CCA’s solution accuracy was substantially higher. Furthermore, as the problem dimension increased, the computation time for exact algorithms rose sharply, while DL-CCA’s computation time grew more gradually, indicating its superior computational efficiency in high-dimensional problems.
Although the Trust-Constr algorithm exhibits slightly better performance in terms of inequality constraint violations, its equality constraint violations are comparable to those of DL-CCA, while the HO algorithm shows significantly higher equality constraint violations than DL-CCA. This indicates that the DL-CCA algorithm is better at satisfying constraint conditions during the optimization process, thereby ensuring solution feasibility. In the ablation experiments of the DL-CCA algorithm, we observed that the DL-CCA algorithm employing a fully connected neural network (FCNN) outperformed those using LSTM and CNN in both solution accuracy and computation time. Specifically, the average optimal objective value for DL-CCA-FCNN was 0.0029, compared to 0.0405 and 0.0279 for DL-CCA-LSTM and DL-CCA-CNN, respectively. Additionally, the average computation time for DL-CCA-FCNN was 76.78 s, significantly lower than the 224.38 s for DL-CCA-LSTM and 413.26 s for DL-CCA-CNN.
Figure 4 illustrates the trends in computation time and optimal values as the problem scale increases for various algorithms. The DL-CCA algorithm demonstrates significant advantages over the exact solver Trust-Constr and the heuristic algorithm HO in terms of solution accuracy and computation time when addressing large-scale complex constrained non-convex problems. Furthermore, the DL-CCA algorithm utilizing FCNN outperforms those employing LSTM and CNN, further validating the effectiveness of FCNN in handling high-dimensional nonlinear optimization problems. These experimental results robustly demonstrate the efficiency and robustness of the DL-CCA algorithm in practical applications.
7. DR-TSPO vs. TSPO Empirical Validation
To validate the advantages of the DR-TSPO model based on loss aversion, we design a comparative experiment to benchmark it against the traditional two-stage robust optimization model. The core of the experiment is to examine the model’s performance under different market distributions when the loss reference point is treated as a random variable. The experiment will optimize a set of real market data using both models and assess their robustness and loss aversion capabilities in practical decision making. By comparing the portfolio’s average drawdown, return, volatility, and risk-adjusted performance metrics, we aim to demonstrate the adaptability and advantages of the loss aversion-based DR-TSPO model in markets with unknown loss reference distributions.
7.1. Experimental Data and Evaluation Metrics
Between 2019 and 2020, global financial markets experienced significant volatility and high uncertainty. The market trend in 2019 was relatively stable, while the outbreak of the COVID-19 pandemic in 2020 triggered a global financial crisis, leading to sharp market fluctuations, a significant drop in stock prices, and a further intensification of the economic recession. Against this backdrop, the experiment can more effectively test the risk transmission of portfolios in both normal market conditions (2019) and extreme market scenarios (2020), thereby assessing their robustness and resilience during financial crises, as shown in
Figure 5. To achieve this, we selected constituent stocks from eight global market indices with different distribution characteristics, including SSE50, Hang Seng Index (HSI), FTSE Index, French CAC40 Index (FCHI), German DAX Index (GADXI), Russian RTS Index (RTS), Nikkei 225 Index (N225), and Nasdaq 100 Index (NDX). These indices represent the economic conditions and market volatility of different regions globally, providing high distribution diversity and representativeness. Specific information and distribution characteristics are presented in
Table 4.
The experiment uses data from 1 January 2019 to 31 December 2019 as the in-sample data for constructing the first-stage portfolio; the second stage is based on 2020 data to assess the portfolio’s out-of-sample performance. The evaluation focuses on two main aspects. (1) First, the portfolio’s ability to avoid losses during the financial crisis (the global pandemic and economic recession in 2020) is assessed. Losses are characterized by comparing the average drawdown and the portfolio rebalancing magnitude in both stages. If the out-of-sample average drawdown decreases as a result, the model is considered to have better robustness and effectively reduces the average loss in the second stage. (2) Second, traditional metrics will be used to evaluate the portfolio’s return and volatility.
Annualized Return: Mean annualized portfolio return.
Standard deviation: Standard deviation of annualized portfolio return.
Maximum.Drawdown: Maximum portfolio drawdown.
Sharpe.Ratio: Sharpe Ratio represents the excess return per unit of total risk taken. .
Beta: Beta represents the relationship between portfolio return volatility and market return volatility, which is a measure of systematic risk. .
Sortino Ratio:. The risk-adjusted return of a portfolio after accounting for negative volatility.
Autocorrelation: A statistic used to measure the correlation between a variable in time series data and itself at different time points. .
Rolling Returns: An indicator that calculates the return of a portfolio or asset over a moving window.
Mean Wealth: Average wealth in the second stage.
7.2. Results and Discussions
First, the experiment compares the mean drawdown and the two-stage rebalancing magnitude of the portfolio on out-of-sample data. The results in
Figure 6 show significant regional differences under different market conditions (i.e., different distribution scenarios), especially among the EU countries represented by FTSE/FCHI/GDAXI and the U.S. and Japanese indices represented by N225/NDX. Specifically, compared to the TSPO model, the DR-TSPO model under the distributionally robust optimization framework demonstrates a clear advantage in robustness. Its out-of-sample average drawdown is consistently lower than that of the TSPO model and at least no worse.
In the Eurozone markets (such as the UK, France, and Germany), the advantages of DR-TSPO are not as pronounced, likely due to the high interconnectivity of market information and relatively clear market distributions (low uncertainty regarding market distributions for investors). In these markets, the loss aversion optimization model demonstrates better drawdown control than the market index, but the DR-TSPO model performs similarly to the standard TSPO in terms of ambiguity aversion and employs similar rebalancing strategies. In contrast, in other markets, the DR-TSPO model generally outperforms the non-robust model. Particularly in the Hong Kong and Russian stock markets (represented by HSI and RTS), the DR-TSPO model shows better adaptability through effective rebalancing compared to TSPO. For example, in the HSI index experiment in Hong Kong, the two-stage loss aversion model outperforms the market index, and the DR-TSPO further strengthens loss control. In the Russian market, due to its high volatility, although the TSPO model underperforms the market index, the DR-TSPO achieves better drawdown levels through larger rebalancing. While in markets such as China and the U.S., the two-stage loss aversion portfolio’s drawdown performance is inferior to the market index, the DR-TSPO shows better adaptability and robustness when facing unknown market distributions compared to the non-robust TSPO model.
In subsequent experiments, we further analyze the performance of the TSPO and DR-TSPO models across several major stock indices in 2020, aiming to compare their risk and return characteristics in market environments with different distributional features. In 2020, global markets faced extreme volatility triggered by the COVID-19 pandemic and sharp changes in monetary policies by governments, resulting in a significant increase in market uncertainty. Against this backdrop, the main objective of portfolio optimization models is to achieve higher returns while maintaining a conservative approach.
As shown in
Table 5, the portfolio performance of the TSPO model and DR-TSPO model across different markets is presented.
On the SSE50 index, the DR-TSPO model achieves an annualized return of 0.4635, which is significantly higher than the TSPO’s 0.3020. Additionally, the DR-TSPO model has a lower volatility of 0.4430 compared to TSPO’s 0.5846, and it also performs better in terms of maximum drawdown, with DR-TSPO at 0.3405 versus TSPO’s 0.4528, indicating better capital protection. In 2020, the Chinese market experienced severe volatility during the early stages of the COVID-19 pandemic. The DR-TSPO model, using a distributionally robust optimization framework, was able to capture rebound opportunities under extreme market conditions while effectively reducing risk. This robust return characteristic highlights the advantage of DR-TSPO in highly volatile market environments. Especially during the pandemic and its aftermath, the traditional TSPO model may struggle to cope with such high market uncertainty and rapid changes, while the DR-TSPO effectively mitigates risk through its optimization strategy. Similarly, in the N225-based experiment, the DR-TSPO model shows an annualized return of 0.7257, slightly higher than TSPO’s 0.7189, with a volatility of 0.3119, compared to TSPO’s 0.4139, indicating better stability. In terms of maximum drawdown, DR-TSPO also outperforms TSPO, with 0.3533 versus 0.3830. In the experiments in the Chinese Shanghai and Japanese markets, the DR-TSPO model significantly outperforms the traditional two-stage loss aversion model in both volatility and profitability. The risk-adjusted Sharpe ratio and Sortino ratio further validate that the distributionally robust optimization method can maintain a certain level of robustness while overcoming over-conservatism, aiming for higher portfolio returns.
Moreover, the DR-TSPO model also performs well in the Eurozone markets, including the FTSE, FCHI, and GDAXI indices. Although the differences in various metrics are small, for example, the annualized return on the FTSE is 0.1641 for DR-TSPO, slightly higher than TSPO’s 0.1632, the volatility of DR-TSPO is significantly lower than that of TSPO. Specifically, the volatility for FTSE is 0.3460 for TSPO and 0.3444 for DR-TSPO, while for FCHI, the volatility is 0.3460 for DR-TSPO and 0.3466 for TSPO. For the German market (GDAXI), the annualized return for DR-TSPO is 1.1657, slightly higher than TSPO’s 1.1646, but the maximum drawdown is notably lower, with DR-TSPO at 0.2495 compared to TSPO’s 0.2496. The small differences can mainly be attributed to the influence of common monetary policies (such as European Central Bank regulation), close economic interconnections, synchronized impacts from the global economic environment, and similar industry structures and capital market linkages in the Eurozone markets. Due to the similar market distribution characteristics, the adaptive advantage of the robust optimization framework in addressing unknown distributions is not as pronounced. However, DR-TSPO excels in controlling volatility and maximum drawdown, demonstrating its advantage in more mature and volatile markets.
A particularly notable performance is observed in the RTS market, where the Russian economy faced multiple challenges such as a sharp drop in oil prices, the outbreak of the COVID-19 pandemic, and international sanctions, leading to extremely high financial market uncertainty. As a high-volatility market dominated by downward trends, this posed a significant test for the risk control capabilities of portfolio optimization methods. The experimental results in
Table 5 indicate that TSPO performed relatively poorly, with an annualized return of −0.0905 compared to −0.1207 for DR-TSPO, volatility (0.3101 vs. 0.3759), and maximum drawdown (0.4093 vs. 0.5532), suggesting that investors could face substantial losses. In contrast, DR-TSPO, with its robust optimization model, effectively reduced volatility in such a high-risk environment, protecting investors from excessive losses. Even though all return-related metrics were negative, DR-TSPO’s ability to control volatility and manage risk remained a significant advantage, especially in a market characterized by high uncertainty and volatility. This allowed DR-TSPO to outperform traditional TSPO in avoiding extreme losses.
For the HSI (Hong Kong Hang Seng Index), although the annualized return of DR-TSPO (1.2015) is slightly lower than TSPO (1.7232), its volatility (0.4516) is significantly lower than TSPO’s (0.5479), and the maximum drawdown (0.3579 vs. 0.4087) also outperforms TSPO. Similarly, in the NDX index experiment, despite the limited profitability of DR-TSPO, it still demonstrates strong risk control, effectively smoothing market fluctuations and providing more stable returns. Looking at the wealth growth across different markets (as shown in
Figure 7), the extreme declines in the RTS and N225 markets were the most severe, and the DR-TSPO model exhibited superior performance compared to the traditional TSPO model. Not only did it excel in volatility and maximum drawdown control, but it also helped avoid sudden losses from unknown factors, reflecting its conservative advantage. Overall, DR-TSPO, through its distribution-robust optimization framework, is better at balancing risk and return in high-uncertainty and high-volatility markets, offering better capital protection and delivering more robust investment performance.
In conclusion, the DR-TSPO model, through its extensive application and in-depth comparative testing across various international markets, has demonstrated its superior risk management and capital protection capabilities compared to the traditional TSPO model. This is especially true when faced with extreme market volatility, high uncertainty, and multiple external shocks such as the COVID-19 pandemic, oil price fluctuations, and international sanctions. With its advanced distribution-robust optimization framework, DR-TSPO excels in controlling volatility and maximum drawdown in markets ranging from China and Japan to the Eurozone and Russia. Even in markets where annualized returns do not show a clear advantage, its robust risk-adjusted performance stands out. This comprehensive and balanced investment strategy not only helps investors minimize potential losses in high-volatility environments but also supports more sustainable and steady wealth growth over the long term. Therefore, the DR-TSPO model provides a new and more reliable methodology for portfolio optimization, especially in the current global economic climate, which is increasingly complex and volatile. Its application value and practical significance are particularly prominent in this context.
8. Conclusions
Unlike existing studies that rely on predetermined loss reference points [
4,
5], our work integrates loss aversion theory with distributionally robust optimization to develop an adaptive loss reference point mechanism. This mechanism dynamically adjusts based on investors’ historical decisions and prevailing market conditions, thereby addressing the limitations of static reference point approaches. By explicitly capturing the path-dependent nature of loss aversion, our framework enhances decision-making flexibility across diverse market regimes. The proposed DR-TSPO model employs uncertainty sets to address distributional ambiguity, relaxing the conventional reliance on strict prior distribution assumptions [
26,
27]. This approach optimizes worst-case expected utility while maintaining robustness across different market regimes. The solution methodology demonstrates that the dual problem can be reformulated as a tractable second-order cone program. To enhance computational efficiency, we propose DL-CCA, a deep learning-based optimization algorithm that embeds constraint penalties within neural networks. Experimental comparisons demonstrate DL-CCA’s superior performance in solving large-scale optimization problems; it achieves an average optimal objective value of 0.0029 with merely 76.78 s of computation time, significantly outperforming traditional algorithms like Trust-Constr [
38] and HO [
39], as well as LSTM/CNN-based variants.
Comprehensive backtesting using global equity data reveals that DR-TSPO delivers stronger performance in volatile markets compared to traditional models. For instance, in China’s market, it achieves higher annualized returns (0.4635 vs. 0.3020) with lower volatility (0.4430 vs. 0.5846), demonstrating improved capital protection. The model particularly excels during extreme market conditions, such as the 2019-2020 period, where it provides more effective risk mitigation. The multidimensional implications of this research offer valuable insights for various financial market participants. For investors, the dynamic reference point mechanism optimizes decision-making processes and reduces irrational trading behaviors. Asset management institutions can leverage the DL-CCA algorithm’s efficiency to enable real-time portfolio rebalancing of complex strategies, thereby enhancing robo-advisory systems. Regulators may consider incorporating the model’s stress-testing performance into systemic risk monitoring frameworks. For policymakers, the findings support the development of algorithmic transparency standards and cross-border regulatory coordination to address emerging challenges in financial technology.
Future research directions could explore multi-asset extensions, macroeconomic factor integration, and reinforcement learning applications to further advance investment decision paradigms toward more intelligent and market-adaptive approaches.