1. Introduction
The mean–variance model proposed by Markowitz has long served as the cornerstone of modern portfolio theory, describing the allocation decisions of rational investors under uncertainty [
1]. The framework assumes normally distributed returns and seeks to balance expected return against risk. However, subsequent studies have documented that financial returns deviate markedly from normality, exhibiting skewness, heavy tails, and excess kurtosis—features that reflect deep uncertainty in financial systems. Moreover, the quadratic utility implied by the mean–variance paradigm fails to capture realistic patterns of decreasing absolute risk aversion. In practice, investors are often willing to tolerate higher volatility in exchange for positively skewed and low-kurtosis returns [
2]. Portfolios with such characteristics tend to offer greater potential for extreme gains [
3] and stronger downside protection [
4]. These observations have motivated the extension of portfolio theory to include higher-order moments as essential dimensions of risk and uncertainty.
Introducing higher-order moments into portfolio selection primarily follows three main directions [
5]. The first extends the classical mean–variance paradigm through utility expansion models, where investor preferences are approximated by a Taylor series that incorporates third- and fourth-order terms. Early contributions demonstrated that portfolios with positive skewness and low kurtosis can yield higher expected utility and reshape the efficient frontier beyond the traditional trade-off between return and variance [
6]. Building on these foundations, Harvey et al. [
7] theoretically generalized the Markowitz model by embedding skewness and kurtosis directly into the optimization frontier, showing that higher-order risk preferences can substantially alter efficient portfolio sets. A second research stream emphasizes robust estimation of higher-order moments, addressing the severe dimensionality and instability of coskewness and cokurtosis matrices in large-asset universes. Representative studies include the nearest-comoment estimator with latent factors, the independent component approach for modeling higher-order dependence, and the parsimonious estimation framework [
8,
9,
10], which maintains statistical precision while reducing computational cost. More recently, tensor-based and weak-factor approaches incorporate higher-order cumulant tensors to capture complex dependence and asymmetry in high-dimensional financial systems [
11]. Furthermore, a third direction involves multiobjective and stochastic optimization frameworks that explicitly integrate higher-order moments into tractable portfolio design. These methods balance return, variance, skewness, and kurtosis within a unified optimization scheme, often employing successive convex approximation or parametric skew-t formulations for scalability [
12,
13]. Beyond the traditional higher-moment framework, several alternative methodological paradigms have recently emerged in portfolio optimization under uncertainty. Generative modeling approaches construct portfolios by simulating return distributions from latent processes; for example, Cheng and Chen propose a unified framework that combines generative forecasts with various optimization objectives and portfolio-blending strategies [
14]. Reinforcement learning and deep reinforcement learning methods directly learn allocation policies or ranking mechanisms from market data. Alzaman, for instance, introduces a stock-ranking and matching model that exemplifies the shift toward adaptive and data-driven portfolio allocation [
15].
Taken together, these developments underscore that the estimation of higher-order moments remains a critical bottleneck in extending portfolio theory beyond the mean–variance framework. To mitigate estimation noise and dimensionality, a growing body of work has turned to shrinkage estimation techniques. The linear shrinkage model proposed by Martellini and Ziemann [
6] extends the constant-correlation framework [
16] and the covariance shrinkage methodology [
17,
18]. This approach effectively stabilizes higher-moment estimates by reducing sampling noise and improving numerical conditioning. Subsequent developments generalized the shrinkage framework to multifactor environments, enhancing estimation precision and robustness in large-scale portfolio applications [
19,
20].
The pioneering work of Ledoit and Wolf marked a turning point in robust covariance estimation [
17,
18]. Their linear shrinkage estimator addressed the instability of the sample covariance matrix by shrinking eigenvalues toward a common mean, substantially improving estimation reliability in high-dimensional settings. Subsequent extensions introduced nonlinear shrinkage, in which eigenvalue shrinkage is adapted locally according to the empirical spectral distribution, thereby achieving further gains in accuracy and robustness [
21,
22,
23]. In their most recent formulation, Ledoit and Wolf [
24] proposed a local shrinkage scheme that smooths neighboring eigenvalues rather than enforcing a global shrinkage target, effectively filtering high-dimensional noise. Despite the extensive progress in covariance shrinkage estimation, relatively little attention has been devoted to extending these principles to higher-order moment structures, such as coskewness and cokurtosis tensors. Most existing studies remain confined to second-order dependence, leaving the problem of higher-order estimation uncertainty largely unresolved.
Building upon this line of research, this paper is the first to extend the nonlinear shrinkage estimation framework from covariance matrices to higher-order moment matrices, thereby broadening its applicability to complex systems and enriching methodologies for managing uncertainty in higher-order dependence structures. The proposed approach integrates factor models into the nonlinear shrinkage process to address the “curse of dimensionality” inherent in large-asset settings. Compared with conventional sample estimation, three-factor estimation, five-factor estimation, and multifactor linear shrinkage estimation, the proposed method demonstrates dual advantages: it minimizes mean squared error (MSE) and maximizes the Percentage Relative Improvement in Average Loss (PRIAL). Beyond statistical gains, the model delivers clear economic benefits: portfolios based on the nonlinear shrinkage estimator achieve higher annualized returns, lower kurtosis, higher Sharpe ratios, and reduced maximum drawdowns, providing stronger resilience to uncertainty in complex financial markets.
The primary contributions of the paper are as follows:
- (1)
It extends the nonlinear shrinkage framework from covariance matrices to higher-order moment tensors, addressing estimation uncertainty beyond the second moment.
- (2)
It integrates multifactor dimension reduction and tensor supersymmetry to ensure computational tractability in large-asset settings.
- (3)
It provides both theoretical justification and empirical evidence showing that the proposed estimator enhances portfolio robustness and investor welfare.
This paper fills this gap by generalizing nonlinear shrinkage to the realm of higher-order moment tensors within a multifactor framework, thereby providing a unified and theoretically grounded approach for reducing estimation noise in coskewness and cokurtosis estimation.
The remainder of this paper is organized as follows.
Section 2 introduces the proposed nonlinear shrinkage method for higher-order moment modeling.
Section 3 establishes its asymptotic properties.
Section 4 reports the results of Monte Carlo simulations.
Section 5 applies the method to portfolio construction.
Section 6 conducts robustness checks on the portfolio results. Finally,
Section 7 discusses the findings and outlines possible directions for future research.
2. Methodology
2.1. Representation of Higher-Order Moment Matrices
Let the number of assets be
and the number of observations be
. Denote by
,
, and
the covariance, coskewness, and cokurtosis matrices, respectively, of the asset return matrix
. The definitions of
,
, and
are as follows:
where
is an
matrix of asset returns, and
is an
matrix of mean returns. The matrices
,
, and
correspond to the second-, third-, and fourth-order co-moment matrices of the
, with respective dimensions
, and
. The operator ⊗ denotes the Kronecker product. The elements of
,
, and
can be expressed as follows:
Accordingly,
and
can also be equivalently expressed as:
It should be noted that , , and can be regarded as flattened representations of spatial tensors in Euclidean space. From the expressions of higher-order moment matrices, it is evident that as the order increases, the number of parameters to be estimated grows exponentially with the number of assets. This exponential growth not only imposes a heavy computational burden but also amplifies estimation uncertainty. This highlights the intrinsic challenges of modeling complex financial systems characterized by nonlinear dependence and heavy-tailed distributions. When higher-order moment matrices are rank-deficient or incomplete, traditional estimation methods often fail to provide stable and reliable results. Therefore, when dealing with higher-order co-moment estimation, it is imperative to adopt methods that can reduce the number of parameters to be estimated.
Supersymmetry in higher-order moment tensors implies that the tensors remain invariant under permutations of their indices. For instance, the third-order co-moment
is identical regardless of whether the indices are ordered as
, or
. From a practical perspective, supersymmetry makes higher-order moment estimation more parsimonious and computationally efficient, which is essential in large-asset settings. By eliminating redundant parameters, it also reduces estimation noise and enhances the stability of portfolio optimization. This is particularly valuable for applications involving higher-order risk measures, such as skewness- or kurtosis-adjusted portfolio selection. For example, in the case of the third-order coskewness tensor, an unconstrained formulation would require estimating
parameters. By applying supersymmetry, the number of unique elements is reduced to
, which greatly alleviates the curse of dimensionality.
Table 1 reports the dimensionality reduction effect of supersymmetry on the number of parameters to be estimated in higher-order moments.
The introduction of tensor supersymmetry ensures that identical statistical interactions among assets are treated equivalently, effectively reducing the number of unique parameters to be estimated. This structural constraint improves numerical stability without sacrificing model flexibility.
In this paper, we draw on the nonlinear shrinkage covariance estimation method [
22,
23,
24] and the multifactor higher-order co-moment estimation approach [
19] to develop a novel nonlinear shrinkage estimation method for multifactor higher-order co-moment matrices. This method effectively reduces the number of parameters to be estimated and mitigates estimation uncertainty. By locally adjusting eigenvalues, nonlinear shrinkage reduces noise and stabilizes higher-order moment estimation, making it particularly suitable for complex financial systems with high-dimensional interactions.
2.2. Factor Model Estimation of Higher-Order Moment Matrices
The return vector is assumed to follow a factor structure:
where
is the
-dimensional return vector,
is the
-dimensional factor vector,
represents idiosyncratic risks orthogonal to factors,
denotes the intercept vector capturing unexplained excess returns, and
is the
full-rank factor loading matrix. The model assumes that factors are mutually independent, residuals are cross-sectionally independent, and factors are independent of residuals. By decomposing returns into common factors and idiosyncratic components, the factor model reduces the dimensionality of higher-order moment estimation. This not only alleviates the curse of dimensionality but also mitigates estimation uncertainty by filtering out noise and isolating the key drivers of dependence in complex financial systems. The higher-order moment matrices can be decomposed as:
where
denotes the covariance matrix (second-order moment), measuring return volatility,
represents the coskewness matrix (third-order moment), capturing return asymmetry, and
is the cokurtosis matrix (fourth-order moment), quantifying tail risk.
,
, and
correspond to the covariance, coskewness tensor, and cokurtosis tensor of the factors, expressed in matrix form. Similarly,
,
, and
are the covariance, coskewness, and cokurtosis matrices of the residual term
. The diagonal elements of
are
, with off-diagonal elements equal to zero.
is a matrix in which all entries are zero except for the
-th element, where
and
is defined as
for all
. The elements of
take the following form:
Let and denote the variances of the -th factor and the -th asset residual, respectively. Through tensor expansion, elements in the tensor are mapped to the matrix . Specifically, the element at position in the tensor is mapped to the position in , yielding , where .
Although factor models rely on the assumption of independence between factors and residuals, this assumption may be overly restrictive in empirical financial settings. Nevertheless, a growing body of literature indicates that factor-model-based estimation procedures remain robust even when the independence assumption is relaxed. Bai and Ng [
25] demonstrate that consistent factor estimation is achievable under weak cross-sectional and temporal dependence of residuals. Ledoit and Wolf [
17,
18] further show that shrinkage-based covariance estimators retain good performance even when residuals are not strictly orthogonal to factors. More recent studies on approximate factor models explicitly allow for weak correlations and establish asymptotic properties under such relaxed conditions [
26,
27].
From Equations (4)–(6), it can be seen that when the number of assets is large, due to the small number of factors and the over-symmetry of the residual tensor, the factor-model estimation of higher-order moments can significantly reduce the number of estimated parameters, effectively mitigating the “dimensionality curse” problem caused by excessive asset dimensions.
2.3. Nonlinear Shrinkage Estimation of Higher-Order Moment Matrices
Let denote the -dimensional positive-definite population covariance matrix, and denote the sample covariance matrix. Its spectral decomposition is , where is the diagonal matrix of sample eigenvalues, arranged in ascending order, and is the orthogonal matrix of corresponding eigenvectors. Equivalently, the sample covariance matrix can be expressed as . We then construct a covariance matrix estimator of the form , where , and denotes the estimated eigenvalue obtained by applying a shrinkage function to the -th estimated eigenvalue .
To demonstrate the superior performance of higher-order moment shrinkage estimation in terms of estimation accuracy, as well as its applicability and stability under various loss functions, this paper considers three shrinkage functions under different loss functions: the Stein loss function (referred to as linear inverse shrinkage, LIS), the Frobenius loss function (referred to as quadratic inverse shrinkage, QIS), and the symmetric Kullback–Leibler loss function (referred to as geometric inverse shrinkage, GIS).
2.3.1. Stein Loss Function
The Stein loss function is defined as
where
denotes the trace,
denotes the determinant. The optimization solution for the loss function is given by:
The optimal estimator for
is given by
where
is infeasible in practice because the population eigenvalues
embedded in
are unobservable. Stein [
28] proposed an approximate estimator for the unobservable
as follows:
Taking the inverse of Equation (10) yields:
In Equation (10), denotes the target estimator, where the first term represents the retained component and the second term represents the target smoothing component. serves as the weighted average of raw inverse eigenvalues. In Equation (11), measures the reciprocal of the difference between inverse eigenvalues, acting as a metric of “attractiveness” among eigenvalue estimates. This formulation clearly demonstrates linear shrinkage with respect to inverse eigenvalues, which combines and the smoothing term via a convex linear combination. The shrinkage intensity is , allowing for stronger shrinkage as dimensionality increases.
The function , referred to as the “Stein shrinkage,” possesses the following properties. First, it induces mutual attraction between eigenvalues. Second, higher-precision eigenvalues exert a stronger influence. Third, as the distance between eigenvalues increases, the denominator of the cross-term grows, leading the attractiveness measure to approach zero. Fourth, when eigenvalues are extremely close, the term approaches infinity.
The issue described in the fourth property above is addressed by the following novel smoothing formulation:
where the smoothing parameter
, with
,
. To address the issue of excessively large values caused by the fourth characteristic of the “Stein shrinkage [
24]”, Equation (13) adopts the functional form to replace
in Equation (11).
Based on Equation (13), this shrinkage quantity is referred to as the “smoothed Stein shrinkage”. When , it is equivalent to Equation (11) and exhibits no smoothing effect; as increases, the smoothing effect strengthens. If , the influence of on is positive, indicating that tends to ; if , the influence of on is negative, indicating that tends to decrease (i.e., tends to ). Consequently, this shrinkage exhibits locality: the impact of more distant eigenvalues decays rapidly as the distance increases.
The covariance matrix estimator is ultimately formulated as , which is referred to as “linear inverse shrinkage (LIS)”.
2.3.2. Frobenius Loss Function
The Frobenius loss function is defined as:
Optimizing this loss function yields the following shrinkage estimator [
24]:
The conjugate of
is given by:
The first two terms of Equation (15) match the form of Equation (12) but differ in coefficients, with an additional third term. Following Ledoit and Péché [
29], this third term constructs a quadratic oscillation term from
and its conjugate
. The coefficients satisfy:
Since the three weighting coefficients are quadratic functions of , their sum forms a perfect square, leading to the name “quadratic inverse shrinkage (QIS)” for Equation (15).
2.3.3. Kullback–Leibler Loss Function
The Kullback–Leibler loss function is defined as:
By solving the optimization problem associated with the loss function, the optimal shrinkage estimator [
24] is derived in the following form:
where
and
denote covariance matrix estimators computed via Equations (7) and (14), respectively.
is termed the “geometric inverse shrinkage (GIS)” estimator, obtained as the geometric mean of the
and
estimators.
Nonlinear shrinkage improves estimation by locally adjusting eigenvalues, thereby reducing noise and uncertainty in high-dimensional settings. Unlike global shrinkage methods, it adapts to the local structure of eigenvalue distributions, stabilizing higher-order moment estimation in complex systems where interactions among assets are nonlinear and highly interdependent.
2.4. Nonlinear Shrinkage Higher-Order Moment Estimation Process
The proposed nonlinear shrinkage estimation of higher-order moments proceeds as follows:
- (1)
Regress the return matrix on the factor model.
Using the factor model in Equation (4), estimates of and are obtained.
- (2)
Estimate higher-order moments of residuals.
First, compute the eigenvalues and eigenvectors of the covariance matrix of residuals. Then, apply nonlinear shrinkage to the eigenvalues using the procedures in Equations (12), (15), and (19). Reconstruct the covariance matrix by combining the shrunken eigenvalues with the original eigenvectors, thereby obtaining the nonlinear shrinkage estimate . Substitute the variance elements of into Equation (6) to derive the fourth-moment estimate . Additionally, define to estimate the third-order moments of residuals.
- (3)
Estimate higher-order moments of factors.
Substitute factor data into Equation (1) to compute the , , and .
- (4)
Aggregate moments of returns.
Substitute the residual multi-order moment estimates from Step (2) and the factor multi-order moment estimates from Step (3) into Equation (5) to derive the joint multi-order moment estimates of returns under nonlinear shrinkage.
4. Monte Carlo Simulation
4.1. Simulation Design
We use constituent stocks of China’s A-share market and obtain weekly return data from the Wind database, spanning January 2006 to December 2020. Parameters of factor loadings and residual structures in the multifactor models are estimated using OLS and subsequently employed to construct the data-generating process (DGP) for the Monte Carlo simulation design. A 2 × 3 sorting method is applied to construct multifactor investment portfolios. The five-factor model includes the market excess return (MKT-RF), the size factor SMB (Small Minus Big), the value factor HML (High Minus Low), the profitability factor RMW (Robust Minus Weak), and the investment factor CMA (Conservative Minus Aggressive). The three-factor model retains MKT-RF, SMB, and HML. In the Monte Carlo simulations, asset dimensionality is set to 5, 10, and 30, while sample sizes are set to 50, 100, 500, and 1000. By varying the number of assets and sample sizes, we evaluate how nonlinear shrinkage estimation manages uncertainty across different levels of system complexity. For each parameter configuration, 200 independent replications are performed to ensure the robustness and representativeness of the statistical results. The Monte Carlo simulation proceeds as follows:
- (1)
Generate factor data
We first generate the distributional parameters of the factor data, including the location vector , scale matrix , skewness parameters , and degrees of freedom , based on the underlying factor distribution. We then simulate a random sample of factor data with sample size from the multivariate skew-t distribution .
- (2)
Generate factor loadings
The distribution parameters of the factor loading matrix are estimated using OLS, yielding and . A random sample of the factor loading matrix is then simulated from the multivariate skew-t distribution .
- (3)
Generate residuals
The probability density function of
is specified in Equation (28), where
denotes the degrees of freedom,
is the asymmetry parameter, and
and
represent the location and scale parameters of the skewed Student’s
distribution, respectively. Residuals are initially obtained from OLS. Based on these estimates, the parameter vector
is obtained under the skewed Student’s
distribution, and subsequently used to simulate the residual vector
.
- (4)
Generate returns
Using the outputs from Steps (1)–(3), we substitute them into Equation (4) to obtain a random sample of returns of length .
- (5)
Compute the PRIAL
We first compute the MSE of the sample estimator, the linear shrinkage estimator (SN), and the nonlinear shrinkage estimator (SH) under three loss functions. Using these MSEs, we then calculate PRIAL values of the nonlinear shrinkage estimator relative to both the sample and linear shrinkage estimators. PRIAL is given by
Higher PRIAL values indicate that the nonlinear shrinkage estimator achieves smaller MSE compared with the sample and linear shrinkage estimators, thus demonstrating the superior accuracy of the proposed method.
4.2. Simulation Results
Table 2 reports PRIAL values for higher-order moment matrices, computed from return data generated by the five-factor model. Positive PRIAL values indicate not only that the MSE of the nonlinear shrinkage estimator is smaller than that of the sample and linear shrinkage estimators, but also that the associated estimation uncertainty is relatively lower. These results demonstrate that the nonlinear shrinkage estimator for higher-order moment matrices enhances the precision of estimating covariance, coskewness, and cokurtosis matrices. Similarly,
Table A1 and
Table A2 also exhibit similar results in
Appendix A.1.
When applying nonlinear shrinkage estimation to covariance matrices, PRIAL results indicate that this approach outperforms both the sample and linear shrinkage methods. For a fixed number of observations , PRIAL values gradually increase as the number of assets grows. For instance, when , as increases from 5 to 30, the PRIAL value of SHFF rises from 12.943 to 35.023, and that of FF rises from 11.683 to 31.137. The results show that as the number of assets increases, nonlinear shrinkage more effectively stabilizes estimation in high-dimensional complex systems, filtering out noise while preserving meaningful dependence structures. Conversely, for a fixed number of assets , PRIAL values generally decline as the number of observations increases. For example, when , as increases from 50 to 1000, the PRIAL value of SHFF decreases from 19.623 to 12.943, and that of FF decreases from 18.331 to 11.683. This indicates that as the data quantity grows, the relative improvement offered by nonlinear shrinkage methods narrows, although their absolute precision remains superior to traditional approaches. Overall, the improvement effect becomes more pronounced with increasing and diminishes with increasing .
In the context of coskewness matrix estimation, PRIAL results indicate that nonlinear shrinkage estimation, compared to sample estimation, leads to a gradual increase in the SHFF value as the number of assets rises while the number of observations is held constant. For example, when , the PRIAL value increases from 78.276 to 91.644 as grows from 5 to 30. This finding indicates that nonlinear shrinkage methods can more accurately capture asymmetric dependence structures among asset returns when the investment portfolio includes a larger number of assets. Such improvement enables investors to identify assets with positive skewness (i.e., right-skewed return distributions) and optimize portfolio allocations to exploit asymmetric return opportunities. When the number of assets is fixed and the number of observations increases, the PRIAL value of the SHFF also shows a gradual upward trend. For example, when , as increases from 50 to 1000, the PRIAL value rises from 26.229 to 78.276. This demonstrates that nonlinear shrinkage estimation can more effectively filter out noise in coskewness data as the data volume expands, facilitating investors’ allocation of assets with positive skewness to achieve higher returns. Overall, the enhancement effects intensify with increases in both the number of assets and the number of observations . Although the PRIAL value of the FF specification is positive, it is much smaller, indicating that the estimation performances of the two approaches are very similar. This outcome is primarily driven by the structural design of the proposed method.
For cokurtosis matrix estimation, the nonlinear shrinkage estimator outperforms both the sample and linear shrinkage estimators. When the observations is fixed, PRIAL values gradually rise as the number of assets increases. For example, when and increases from 5 to 30, the SHFF-based PRIAL value rises from 66.977 to 89.219. This pattern indicates that in high-dimensional portfolio settings, the nonlinear shrinkage estimator improves accuracy by locally adjusting the eigenvalue distribution of the cokurtosis matrix. The resulting gains help investors identify tail co-movement and crash-prone configurations, thereby strengthening tail-risk management. When the number of assets is fixed, increasing the observations leads to a gradual rise in the PRIAL value. For example, with , as increases from 50 to 1000, the PRIAL value of SHFF increases from 26.145 to 66.977. Larger samples enable the nonlinear shrinkage estimator to significantly mitigate estimation bias in the cokurtosis matrix, allowing investors to more reliably extract tail-dependence structures from historical data. By comparison, the PRIAL values for FF remain consistently positive, confirming the superior precision of nonlinear shrinkage methods, although no systematic directional trend is observed.
These improvements reflect not only better risk–return trade-offs but also enhanced resilience to systemic uncertainty.
Table A3,
Table A4 and
Table A5 report the PRIAL values of higher-order moments based on return data generated by the three-factor model. The results show that the nonlinear shrinkage estimator continues to improve the estimation accuracy of higher-order moments, consistent with the findings under the five-factor model.
5. Empirical Analysis
5.1. Data Processing
This paper employs weekly returns of constituent stocks in China’s A-share market from January 2006 to December 2020. To ensure sample consistency and reliability, stocks with trading suspensions exceeding 20 consecutive weeks during the sample period are excluded. After screening, 100 stocks are retained, yielding 764 usable weekly observations. Subsequently, subsamples of different sizes (e.g., 10 stocks, 30 stocks) are randomly selected for analysis. For sample partitioning, data from January 2006 to December 2010 are used as the training set to estimate model parameters, while data from January 2011 to December 2020 constitute the test set for model validation and portfolio evaluation. To enhance estimation timeliness and adaptability in the presence of uncertainty, a rolling-window estimation procedure is employed, which allows the model to adjust continuously to the evolving dynamics of complex financial systems. In each period, the higher-order moment matrices are estimated using the preceding five years of historical data and applied to portfolio construction.
5.2. Maximizing Expected Utility Portfolio
The portfolio objective function is constructed under a CRRA preference framework, incorporating variance, skewness, and kurtosis. We compare portfolios estimated via nonlinear shrinkage with those based on sample and linear shrinkage estimators to identify optimal asset allocations. This comparison highlights each method’s ability to manage estimation uncertainty and systemic complexity in portfolio construction. Assuming zero expected returns for all assets, the portfolio objective function with short-sale constraints is given by:
where
denotes the weight of stock
, and
represents the risk-aversion coefficient. This paper considers two levels of risk aversion,
and
. To evaluate the practical applicability of the proposed methodology, we conduct out-of-sample performance tests on portfolios constructed under different estimation frameworks. In addition to annualized return (AR), we assess portfolio performance using multiple metrics, including annualized volatility (AV), value at risk (VaR), Sharpe ratio (SR), kurtosis, and maximum drawdown (MD), to provide a systematic evaluation from both return and risk perspectives.
Table 3 reports the out-of-sample portfolio performance of various estimation methods when the number of assets is 30 and the number of factors is 5. Results from the fourth-order approximation of the investor utility function are reported, where the risk-aversion coefficient is set to 5. The results reveal the following:
- (1)
In terms of annualized return, the portfolio constructed using the nonlinear shrinkage estimator outperforms all other methods, followed by the five-factor approach; both substantially exceed the linear shrinkage and sample covariance estimators. This indicates that nonlinear shrinkage estimation more effectively captures return structures, thereby improving portfolio performance.
- (2)
Regarding the Sharpe ratio, the nonlinear shrinkage method again yields the highest risk-adjusted return, demonstrating its ability to generate excess returns while controlling risk. The five-factor model ranks second, whereas the sample covariance and linear shrinkage methods perform relatively weakly. This suggests that the nonlinear shrinkage portfolio achieves higher excess returns per unit of risk, thereby combining high annualized return with superior downside protection.
- (3)
With respect to the maximum drawdown ratio, the nonlinear shrinkage estimator slightly outperforms other methods in extreme risk control, producing the smallest drawdown magnitude. A lower maximum drawdown ratio implies that this estimator more effectively mitigates potential losses under extreme market conditions, enhancing portfolio resilience to tail risk.
- (4)
Based on the fourth-order CRRA utility expansion, nonlinear shrinkage estimators yield the highest gains in expected utility, indicating their superior risk-adjusted performance. For moderate risk aversion (), the utility gains range between 4 and 5%, while for more risk-averse investors (), the improvements increase to approximately 6–8%. These gains are mainly driven by lower kurtosis and reduced tail risk.
Table 3.
Out-of-sample portfolio performance under CRRA utility with 30 assets and five factors across different methods.
Table 3.
Out-of-sample portfolio performance under CRRA utility with 30 assets and five factors across different methods.
Coefficient | Method | AR | AV | VaR | SR | Kurtosis | MD | Utility Gain |
---|
| Sample | 11.837 | 0.216 | −0.241 | 0.548 | 4.259 | 22.103 | - |
Five-Factor | 13.033 | 0.221 | −0.238 | 0.593 | 3.980 | 20.379 | 3.65% |
Linear | 11.667 | 0.212 | −0.232 | 0.549 | 4.201 | 20.925 | 4.12% |
QIS | 13.439 | 0.220 | −0.238 | 0.609 | 3.880 | 20.212 | 5.26% |
LIS | 13.187 | 0.222 | −0.242 | 0.594 | 3.911 | 20.156 | 4.67% |
GIS | 13.354 | 0.221 | −0.240 | 0.603 | 3.896 | 20.225 | 5.25% |
| Sample | 11.709 | 0.217 | −0.244 | 0.539 | 4.232 | 22.407 | - |
Five-Factor | 12.666 | 0.222 | −0.241 | 0.576 | 4.019 | 21.190 | 4.85% |
Linear | 11.379 | 0.213 | −0.234 | 0.535 | 4.199 | 21.273 | 4.27% |
QIS | 13.090 | 0.221 | −0.241 | 0.592 | 3.917 | 20.993 | 8.13% |
LIS | 12.828 | 0.222 | −0.244 | 0.577 | 3.945 | 21.028 | 6.18% |
GIS | 13.000 | 0.221 | −0.242 | 0.587 | 3.930 | 21.153 | 6.45% |
These results underscore the capacity of nonlinear shrinkage to reduce uncertainty and strengthen portfolio robustness. While measures such as VaR and annualized volatility display relatively similar outcomes across methods, the kurtosis of the nonlinear shrinkage estimator remains the lowest. This highlights its superior performance in mitigating extreme values in the return distribution and reducing exposure to large market fluctuations. Furthermore, the nonlinear shrinkage estimators enhance investors’ welfare by delivering more stable risk-adjusted performance. A 4–8% improvement in CRRA utility corresponds to a substantial increase in certainty-equivalent wealth, which can offset moderate transaction costs or management fees typically observed in institutional portfolios. In addition, the reduction in higher-order risks (particularly kurtosis) suggests that the proposed estimators are especially valuable in large-asset settings, where diversification alone cannot fully eliminate tail dependencies. Overall, under comparable levels of annualized volatility and maximum drawdown, the nonlinear shrinkage portfolio not only delivers higher annualized returns but also achieves a higher Sharpe ratio. This implies that investors can obtain greater returns without taking on additional risk, or bear lower risk for the same expected return level—an outcome particularly relevant for those pursuing high risk-adjusted performance.
When the risk aversion parameter increases from 5 to 10, it can be observed that both the average annualized return and the Sharpe ratio generally decline, though the magnitude of this decline is not substantial. Meanwhile, other metrics remain largely unchanged. This suggests that nonlinear shrinkage estimation still exhibits favorable properties.
Table 4 reports the out-of-sample portfolio performance of different estimation methods when the number of assets is small (
). The results reveal the following:
- (1)
Annualized returns are relatively similar across methods, with sample estimates delivering the highest returns and the five-factor model delivering the lowest. Nonlinear and linear shrinkage show only minor differences in return prediction, suggesting comparable performance in capturing return dynamics.
- (2)
The ranking of Sharpe ratios closely mirrors that of annualized returns. Nonlinear shrinkage performs slightly worse than linear shrinkage, indicating that when portfolios consist of a small number of assets, simple linear methods may sufficiently capture the key risk–return trade-offs. In such cases, linear shrinkage offers comparable effectiveness while being more parsimonious and operationally efficient.
- (3)
Analysis of kurtosis shows that nonlinear shrinkage yields lower kurtosis than linear shrinkage, implying better mitigation of extreme values, a more concentrated return distribution, and improved portfolio stability.
- (4)
Comparing
Table 4 with
Table 3 reveals that as the asset dimension increases from 10 to 30, the advantages of nonlinear shrinkage in annualized return, Sharpe ratio, and kurtosis become more pronounced. Although higher dimensionality enhances diversification and thus mean returns, it also introduces greater estimation uncertainty and tail risk, leading to slightly smaller CRRA utility gains. This outcome reflects a rational trade-off between improved returns and controlled risk exposure. Even so, under lower risk aversion (
), nonlinear shrinkage continues to outperform the sample and linear estimators, indicating its effectiveness in capturing complex interdependencies among a larger set of assets and enhancing overall portfolio robustness.
Table 4.
Out-of-sample portfolio performance under CRRA utility with 10 assets and five factors across different methods.
Table 4.
Out-of-sample portfolio performance under CRRA utility with 10 assets and five factors across different methods.
Coefficient | Method | AR | AV | VaR | SR | Kurtosis | MD | Utility Gain |
---|
| Sample | 12.167 | 0.202 | −0.223 | 0.604 | 4.611 | 18.666 | - |
Five-Factor | 10.958 | 0.194 | −0.217 | 0.564 | 4.356 | 18.218 | 6.43% |
Linear | 11.545 | 0.198 | −0.220 | 0.582 | 4.389 | 18.191 | 4.06% |
QIS | 11.430 | 0.196 | −0.217 | 0.582 | 4.264 | 18.095 | 5.32% |
LIS | 11.305 | 0.196 | −0.217 | 0.578 | 4.274 | 18.122 | 5.27% |
GIS | 11.379 | 0.196 | −0.217 | 0.581 | 4.269 | 18.025 | 5.35% |
| Sample | 11.958 | 0.202 | −0.228 | 0.593 | 4.785 | 19.218 | - |
Five-Factor | 10.334 | 0.191 | −0.217 | 0.540 | 4.464 | 18.342 | 8.94% |
Linear | 10.989 | 0.196 | −0.221 | 0.560 | 4.510 | 18.257 | 6.46% |
QIS | 10.832 | 0.193 | −0.217 | 0.553 | 4.369 | 18.112 | 10.12% |
LIS | 10.684 | 0.193 | −0.217 | 0.554 | 4.383 | 18.138 | 9.95% |
GIS | 10.772 | 0.193 | −0.217 | 0.557 | 4.376 | 18.126 | 10.02% |
Table 5 further compares the out-of-sample forecasting performance of the different estimators. The results indicate that, across all moment orders, the nonlinear shrinkage method outperforms alternative approaches, confirming its effectiveness in enhancing estimation accuracy. Moreover, the PRIAL values of the nonlinear shrinkage estimator relative to the linear shrinkage estimator are overall positive, with the most pronounced improvements observed for covariance and cokurtosis, while the gains for coskewness are relatively limited. These findings are consistent with the conclusions drawn from the preceding simulation analysis.
In summary, by optimizing the eigenvalue spectrum, nonlinear shrinkage improves both estimation accuracy and robustness, representing a practical approach for portfolio construction.
7. Conclusions
This paper extends nonlinear shrinkage estimation from covariance matrices to higher-order moment matrices within a multifactor framework. By integrating factor-model dimension reduction, tensor supersymmetry, and nonlinear eigenvalue shrinkage, the proposed approach mitigates dimensionality issues and enhances estimation robustness in complex financial systems. Theoretical results establish its asymptotic consistency, and Monte Carlo simulations confirm substantial reductions in MSE and improvements in PRIAL relative to sample and linear shrinkage benchmarks. Empirical evidence from weekly A-share data demonstrates that portfolios constructed from nonlinear-shrinkage higher-order moments achieve higher risk-adjusted performance, characterized by increased annualized returns, reduced kurtosis, and smaller drawdowns, particularly in large-asset universes. These findings highlight the practical economic relevance of directly addressing estimation uncertainty through localized eigenvalue adjustment.
This study also contributes to the broader literature on random matrix theory, robust covariance estimation, and high-dimensional portfolio optimization. The concept of shrinkage estimation originated from Stein’s pioneering work on multi-parameter estimation [
28]. Building on this foundation, Ledoit and Wolf and Ledoit and Péché developed eigenvalue-shrinkage techniques under the random matrix theory framework [
29,
30,
31], establishing a rigorous foundation for noise reduction in large-dimensional covariance estimation. Subsequent advances in robust and approximate factor modeling by Bai and Ng, Fan, Liao, and Mincheva, and Onatski further addressed weak-factor structures and cross-sectional dependence [
25,
26,
27]. From an asset-pricing perspective, stochastic volatility and multifactor models [
32,
33] link higher-order moment dynamics to option-implied risk premia [
32,
34]. Future research could explore dynamic extensions such as time-varying shrinkage [
33], dynamic equicorrelation [
35], regime-switching dependence [
36], and realized high-frequency moment estimation [
37] to better capture market nonstationarity and structural change.