Next Article in Journal
Temporal and Cost-Sensitive Evaluation Framework for Credit Risk Modeling Under Distributional Shifts
Previous Article in Journal
Quantile Domain Connectedness Between Climate Risks and Cryptocurrency Classes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parity Regression Estimation †

1
Bayes Business School, City St George’s, University of London, 106 Bunhill Row, London EC1Y 8TZ, UK
2
Faculty of Mathematics and Computer Science, University of Bucharest, Str. Academiei 14, 010014 Bucharest, Romania
3
Research Unit 5, Simion Stoilow Institute of Mathematics of the Romanian Academy, C.P. 1-764, 010702 Bucharest, Romania
4
Department of Economics, Business, Mathematics and Statistics (DEAMS), Università Degli Studi di Trieste, Via Università 1, 34127 Trieste, Italy
*
Author to whom correspondence should be addressed.
An R package implementing our estimators is available on CRAN at https://cran.r-project.org/web/packages/savvyPR/index.html (accessed on 17 March 2026).
Risks 2026, 14(4), 94; https://doi.org/10.3390/risks14040094
Submission received: 5 March 2026 / Revised: 9 April 2026 / Accepted: 13 April 2026 / Published: 21 April 2026

Abstract

Multiple linear regression remains a foundational predictive methodology across a broad range of applications. We propose a novel regression framework that, rather than minimising the aggregate prediction error associated with the dependent variable, explicitly distributes the risk evenly across all model parameters. This approach provides a structural safeguard that is particularly suitable for data affected by substantial noise, as is often the case in time series environments characterised by regime shifts, structural breaks, and evolving trends. We provide a theoretical characterisation of our proposed estimator, named Parity Regression, and benchmark its analytical properties against existing penalised and shrinkage estimators in the literature. Both synthetic experiments and empirical applications demonstrate that the theoretical guarantees of the proposed method translate into enhanced out-of-sample forecasting stability in practice.
JEL Classification:
C13; C53; C58; G17

1. Introduction

Multiple linear regression is one of the most widely used predictive and inferential models across a broad range of scientific disciplines, including economics, engineering, medicine, and the social sciences. The model relates a scalar response random variable Y to a set of explanatory random variables X 1 , X 2 , , X p through the linear model, where p denotes the number of covariates
Y = θ 0 + θ 1 X 1 + + θ p X p + ε ,
where θ = ( θ 0 , θ 1 , , θ p ) is the unknown parameter vector and ε is a random error term with E [ ε ] = 0 . For a sample of size n drawn from ( Y , X ) , let y = ( y 1 , y 2 , , y n ) denote the observed vector of responses and X ˜ be the n × ( p + 1 ) design matrix with rows x ˜ i , where x ˜ i = ( 1 , x i ) for all 1 i n .
A common objective in regression analysis is to estimate θ via estimators with a low mean squared error (MSE); the MSE of an estimator is explicitly defined in (3). This is typically achieved by minimising a loss functional L measuring the discrepancy between the dependent variable and its linear predictor. The standard choice is the l 2 loss, which leads to the well-known Ordinary Least Squares (OLS) estimator, denoted by θ ^ OLS , which is obtained by minimising the residual sum of squares (RSS):
θ ^ OLS : = argmin θ R p + 1 RSS ( θ ) , where RSS ( θ ) : = 1 n i = 1 n ( θ x ˜ i y i ) 2 .
Here, L = RSS . The OLS estimator has a closed-form solution as follows:
θ ^ OLS = ( X ˜ X ˜ ) 1 X ˜ y ,
provided that X ˜ X ˜ is invertible. Note that X ˜ X ˜ is a symmetric matrix that is always positive semi-definite and not necessarily positive definite, which would guarantee the existence of ( X ˜ X ˜ ) 1 . The OLS estimator is often called the Best Linear Unbiased Estimator (BLUE) according to the Gauss–Markov Theorem (Gauss 1821; Markov 1912), which guarantees it has the lowest variance among all unbiased linear estimators. Furthermore, if the error term is normally distributed, OLS coincides with the maximum likelihood estimator, allowing for exact finite-sample inference (Seber and Lee 2003).
A limitation of (2) is that its estimation error is driven by the estimation error of X ˜ X ˜ , which can be highly problematic since the empirical eigenvalues of a matrix are often poor estimators of the population eigenvalues; for a detailed discussion, see Asimit et al. (forthcoming-b), with a summary provided in the Literature Review. When the sample is affected by substantial noise, as is common in time series data with structural changes or evolving trends, the out-of-sample (OOS) performance of OLS deteriorates further. This motivates the need for a more robust linear regression estimator suitable for such settings.
In this paper, we introduce our novel regression method, Parity Regression (PR), and outline three primary contributions. First, we propose the PR estimator, which, rather than minimising the global empirical risk, ensures that prediction errors are fairly distributed across all model parameters through a rigorous theoretical characterisation. Second, we empirically show that our estimator outperforms OLS, as well as existing penalised and shrinkage estimators, both on synthetic simulations and real-world datasets. Third, to facilitate the reproduction of results and practical application, we have made the proposed estimator publicly available via the R package savvyPR on CRAN.1

Literature Review

Our review of the literature begins with Stein’s paradox (James and Stein 1961; Stein 1956), which marked a fundamental shift in statistical thinking by demonstrating that shrinkage can systematically improve estimation accuracy under the MSE criterion. By showing that deliberate introduction of bias may reduce the overall estimation error through a variance–bias trade-off, it provided a conceptual foundation for modern regularisation techniques. Although shrinkage in its classical form emerged after Tikhonov regularisation (Tikhonov et al. 1943), which laid the groundwork for penalised regression, the underlying principle is closely related. The common thread linking penalised regression and shrinkage, despite their origins in different applications, is that a controlled introduction of bias can substantially reduce the estimator variability, thereby yielding an estimator that outperforms the natural unbiased estimator.
We begin by setting aside the estimation of the regression parameter vector θ and instead consider the problem of estimating the population mean vector μ , thereby clarifying the foundations of the shrinkage principle arising from Stein’s paradox. This paradox, introduced in the seminal papers by (James and Stein 1961; Stein 1956), fundamentally challenges the established statistical paradigm. Its core premise is that, although unbiased estimators possess robust theoretical properties, they may still be strictly suboptimal when efficiency is assessed using the MSE criterion. Recall that the MSE of a generic estimator θ ^ of θ is defined as
MSE θ ^ : = Var θ ^ + Bias θ ^ 2 .
The estimator proposed in (James and Stein 1961; Stein 1956), commonly referred to as the James–Stein estimator, demonstrates that the sample mean vector X ¯ R p is a sub-optimal estimator of the population mean vector μ . Assuming a multivariate normal sampling distribution, an estimator that strictly dominates the sample mean in terms of MSE can be constructed via multiplicative shrinkage, μ ^ = c X ¯ , where c represents the theoretically optimal shrinkage intensity. This estimator is often termed the oracle shrinkage estimator, as it still depends on unknown population parameters and is therefore not fully data-driven. In practice, substituting these unknown population parameters with sample estimates gives a fully data-driven counterpart, often referred to as a bona fide shrinkage estimator. For example, the James–Stein estimator derived in James and Stein (1961) is given by
μ ^ PJS : = 1 ( p 2 ) σ ^ 2 n X ¯ 2 2 + X ¯ , for all p 3 and n 2 ,
where t + : = max ( t , 0 ) and σ ^ 2 : = 1 p Tr ( S ) , with S denoting the sample covariance matrix estimator; note that · p denotes the usual p-norm. For a comprehensive treatment of mean vector shrinkage estimation, the reader is referred to (Asimit et al., forthcoming-c; Bodnar et al. 2022).
Beyond yielding mean vector estimators with strictly reduced estimation error, Stein’s paradox establishes a generalised shrinkage principle that extends well beyond the confines of high-dimensional mean estimation. In particular, this principle can be applied to the estimation of the regression parameter vector θ . We therefore provide a succinct review of shrinkage estimators, which deliberately introduce bias in order to reduce the overall MSE through a variance-bias trade-off, a concept that lies at the core of Cross Validation (CV) in statistics and machine learning.
An alternative to OLS is Ridge Regression (RR), introduced by Hoerl and Kennard (1970), which is designed to mitigate overfitting by shrinking the regression parameters and is particularly useful in the presence of multicollinearity or ill-conditioning (i.e., when X ˜ X ˜ possesses zero or near-zero eigenvalues). RR minimises the L 2 -penalised RSS as follows:
θ ^ R R ( λ ) : = argmin θ R p + 1 RSS ( θ ) + λ θ 2 2 , λ 0 ,
where λ is a tuning parameter controlling the strength of the penalty. The solution admits the closed-form expression
θ ^ R R ( λ ) = X ˜ X ˜ + λ I p + 1 1 X ˜ y ,
where λ > 0 guarantees that X ˜ X ˜ + λ I p + 1 1 exists. The penalisation term reduces estimation error, particularly when some eigenvalues of X ˜ X ˜ are zero or close to zero.
By standard duality arguments, (4) is equivalent to the constrained formulation
min θ R p + 1 RSS ( θ ) subject to k = 0 p θ k 2 λ ˜ , λ ˜ 0 ,
where λ ˜ controls the size of the constraint set.
RR is a particular case of Tikhonov regularisation (Tikhonov et al. 1943), a broader framework for addressing ill-posed estimation problems. Specifically, for a penalty function g : R p + 1 R + , the Tikhonov estimator is defined as
θ ^ : = argmin θ R p + 1 1 2 y X θ 2 2 + g ( θ ) .
When g ( θ ) = λ θ 2 2 , the Tikhonov estimator reduces to the RR estimator. Since
θ ^ R R ( 0 ) = θ ^ O L S and θ ^ R R ( λ ) 0 as λ ,
RR is a shrinkage estimator that increasingly biases the estimates towards the origin as λ grows. Hoerl and Kennard (1970) showed that there exists an oracle estimator λ > 0 such that
MSE θ ^ R R ( λ ) < MSE θ ^ O L S ,
demonstrating that RR can outperform OLS when λ is suitably chosen. CV provides a practical method for selecting a bona fide estimate of λ ; however, the in-sample optimal choice may not be too close to the OOS optimal choice, which may increase the estimation error of the linear model.
When g ( θ ) = t θ 1 with t 0 , the Tikhonov estimator reduces to the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996) and Basis Pursuit Denoising (Chen and Donoho 1994). The L 1 -norm penalty induces sparsity by driving certain coefficients exactly to zero, thereby performing explicit variable selection and regularisation. Owing to the non-differentiability of the L 1 penalty, LASSO does not admit a closed-form solution; however, it is equivalent to the constrained optimisation problem
min θ R p + 1 RSS ( θ ) subject to k = 0 p | θ k | t ˜ , t ˜ 0 .
Although both LASSO and RR regularise the model, they differ fundamentally in their mechanisms. LASSO induces sparsity by setting certain parameters exactly to zero, thereby selecting a subset of predictors and enhancing interpretability. In contrast, RR shrinks the parameters continuously towards zero without eliminating any of them entirely.
RR and LASSO are examples of penalised regression methods that can be interpreted as shrinkage estimators, although they are not constructed explicitly as shrinkage procedures. In contrast, there exists a broad class of estimators that directly shrink the OLS estimator towards a specified target. Such shrinkage estimators are typically simple, admitting closed-form expressions that are designed to optimise the theoretical MSE. Ideally, the corresponding oracle optimal shrinkage estimator is available in closed form, with its plug-in counterpart serving as a bona fide estimator, although CV may alternatively be employed. While both approaches exhibit distinct computational and theoretical trade-offs during implementation, this practical distinction remains largely underexplored in the existing literature.
The Liu estimator (Liu) (Liu 1993) is a shrinkage estimator that directly shrinks the OLS estimator towards the target X ˜ X ˜ + I p + 1 1 X ˜ y . Specifically, it modifies the OLS estimator as follows:
θ ^ Liu ( d ) = X ˜ X ˜ + I p + 1 1 X ˜ y + d θ ^ OLS ,
where d ( 0 , 1 ) is a shrinkage parameter. Under certain conditions, Liu (1993) showed that there exists an optimal d ( 0 , 1 ) such that
MSE θ ^ Liu ( d ) < MSE θ ^ OLS .
Hence, the oracle optimal shrinkage estimator d admits a closed-form expression. Nevertheless, in practice, standard software implementations typically select d via CV, similarly to the RR estimator. Finally, note that θ ^ Liu ( 1 ) = θ ^ OLS .
Liu (2003) extended this framework by proposing a two-parameter Liu estimator to address multicollinearity more effectively. The objective is for this estimator to inherit the stabilising properties of RR, which is specifically designed to accommodate ill-conditioned design matrices where X ˜ X ˜ exhibits zero or near-zero eigenvalues, which is a hallmark of multicollinearity. The two-parameter Liu estimator introduces an additional parameter k to provide finer control over the shrinkage effect, while retaining the adjustment governed by d. It is defined as
θ ^ Liu - type ( k , d ) = X ˜ X ˜ + k I p + 1 1 X ˜ y d θ ^ R R .
Liu (2003) showed that for any k > 0 , there exists an optimal d such that
MSE θ ^ Liu - type ( k , d ) MSE θ ^ R R .
Although this estimator offers greater flexibility, both parameters must typically be selected via CV in practical implementations. Joint optimisation of ( k , d ) entails a considerable computational burden, and the additional estimation variability may increase the overall MSE, and thus, despite its appealing theoretical guarantees, the two-parameter Liu estimator is less practical for empirical applications, and we therefore exclude it from our current implementation.
The remainder of the paper is organised as follows. Section 2 presents the main theoretical results. Section 3 reports an informative simulation study, while Section 4 provides a comprehensive real-data analysis. Concluding remarks are given in Section 5. All proofs and supporting technical details are collected in three appendices. Appendix A contains the proofs of all theoretical results. Appendix B provides additional details on the data-generating process underlying the simulation study. Appendix C includes further information on the datasets used in Section 4.

2. Main Results

This section presents all the theoretical results of the paper. We begin with Section 2.1, which provides a general overview of parity estimation, including related theoretical results underpinning the concept. This foundational theory clarifies how parity estimation is particularly applied to multiple linear regression; for further details, see Section 2.2. The section concludes with Section 2.3, where we aim to enhance the explainability of the proposed concepts developed in Section 2.1 and Section 2.2.
Before presenting the main results, we introduce some notation. The symbol ⪰ indicates that one symmetric matrix is greater than or equal to another in the Loewner ordering, meaning that their difference is positive semidefinite, whereas ≻ denotes strict dominance, meaning that their difference is positive definite. In particular, Σ 0 ( Σ 0 ) indicates that Σ is positive definite (positive semidefinite). Additionally, diag ( A ) denotes the diagonal matrix formed from the diagonal elements of the matrix A .

2.1. Parity Estimation

In statistics and machine learning, predictive models aim to relate a dependent target variable Y to a covariate (feature) vector X = ( X 1 , X 2 , , X p ) . Let l : R × R p × Θ R + denote a loss function, where Θ R q is the feasible set of the q-dimensional parameter vector θ , and q > p . In a multiple linear regression model comprising p covariates and an intercept term, we have q = p + 1 . As discussed in Section 2.2, when a synthetic term is introduced, this dimension q is further extended to q = p + 2 . The model parameters θ are estimated by minimising the expected loss, L ( θ ) : = E l ( Y , X ; θ ) . It is often assumed that L : Θ ( 0 , ] is differentiable, which we assume throughout the paper. The estimation problem therefore reduces to finding
θ ^ argmin θ Θ L ( θ ) .
We now introduce the notion of parity estimation, which, to the best of our knowledge, has not previously been studied. A parity estimator is a vector θ Θ such that θ i 0 for all i = 1 , 2 , , q , and
L ( θ ) θ k / L ( θ ) θ k = L ( θ ) θ l / L ( θ ) θ l for all 1 k < l q .
Condition (11) states that the elasticity of the loss function L is the same across all components of the parameter vector θ . This is inspired by the theoretical foundation of capital allocation in linear risk portfolios; for a brief overview of this concept, we refer the reader to (Asimit et al. 2011, 2013, 2019; Tasche 1999). We are now ready to introduce the additional assumptions required for our main results, stated as Assumptions 1 and 2.
Assumption 1.
Assume that the feasible set Θ R q is the parametric cone
K q ( δ ) : = θ R q : δ i θ i > 0 , i = 1 , , q ,
where δ { 1 , 1 } q .
Assumption 2.
The loss function L satisfies the following growth condition: there exist constants M 1 , M 2 > 0 such that
L ( θ ) M 1 θ for all θ Θ with θ > M 2 , where θ = max i | θ i | .
We are now ready to present our first main result, stated as Theorem 1, which provides a characterisation of parity estimation.
Theorem 1.
Under Assumption 1, the following results hold.
(i)
For any μ > 0 , any solution of (12)
min θ K q ( δ ) L ( θ ) μ k = 1 q log ( δ k θ k ) ,
is a parity estimator.
(ii)
If L is convex and Assumption 2 holds, then (12) admits a unique solution for any μ > 0 .
(iii)
Assume that L ( θ ) is convex and homogeneous of order τ 1 . Then, for any μ > 0 , (12) admits a unique solution θ ( μ ) , which satisfies
L ( θ ( μ ) ) = q μ τ and θ ( μ ) = μ 1 / τ θ ( 1 ) .
Further, θ ˜ K q ( δ ) is a parity estimator in (11) if and only if there exists μ 0 > 0 such that θ ˜ = μ 0 1 / τ θ ( 1 ) .
Condition (11) may be relaxed to define a partial parity estimator. Specifically, a vector θ Θ is called a partial parity estimator if θ i 0 for all i = 1 , 2 , , q 0 , and
L ( θ ) θ k 1 / L ( θ ) θ k 1 = L ( θ ) θ k 2 / L ( θ ) θ k 2 for all 1 k 1 < k 2 q 0 .
Condition (13) states that the elasticity of the loss function L is identical across a specified subset of components of the parameter vector θ .
We are now ready to present our second main result, stated as Proposition 1, which provides a characterisation of partial parity estimation.
Proposition 1.
Under Assumption 1 and let t = ( t q 0 + 1 , t q 0 + 2 , , t q ) 0 q q 0 , where 0 q q 0 is a zero vector of dimension q q 0 , the following results hold.
(i)
For any μ > 0 and t 0 q q 0 , any solution of (14)
min θ K q ( δ ) L ( θ ) μ k = 1 q 0 log ( δ k θ k ) μ k = q 0 + 1 q t k log ( δ k θ k ) ,
is the parity estimator in (13).
(ii)
Assume that L is convex and Assumption 2 holds. Then, (14) admits a unique solution for any μ > 0 and t 0 q q 0 .
(iii)
Assume that L ( θ ) is convex and homogeneous of order τ 1 in θ, then for any μ > 0 and t 0 q q 0 , (14) admits a unique solution θ ( μ , t ) , which satisfies
L ( θ ( μ , t ) ) = q 0 + 1 t μ τ and θ ( μ , t ) = μ 1 / τ θ ( 1 , t )
The proof of Proposition 1 is omitted, as it follows the same reasoning used in the proof of Theorem 1. Note that the final statement in Proposition 1 (iii) does not hold on an “if and only if” basis, unlike its counterpart in Theorem 1 (iii). Specifically, we cannot assert that any interior point θ ˜ K q ( δ ) satisfying (13) necessarily implies the existence of μ 0 > 0 and t 0 0 q q 0 such that θ ˜ = μ 0 1 / τ θ ( 1 , t 0 ) . This means that the parametric optimal solutions in (14), with parameters ( μ , t ) R + + × R + q q 0 , may yield a large set of vectors satisfying (13), but not necessarily all of them. This contrasts with Theorem 1, where there is a one-to-one correspondence between the parametric set of optimal solutions in (12), parameterised by μ R + + , and the set of vectors satisfying (11). Consequently, it is practical to search for optimal solutions in (14) over ( μ , 0 q q 0 ) with μ > 0 . That is, a partial parity estimator can be obtained as the parametric set of solutions in μ , given by
min θ K q 0 ( δ ) L ( θ ) μ k = 1 q 0 log δ k θ k , with μ > 0 ,
since this formulation effectively minimises the loss with respect to the remaining components ( θ q 0 + 1 , , θ q ) .
Section 2.2 provides a broad overview of how parity estimation operates in the context of multiple linear regression, while also presenting some specific theoretical results for linear regression models.

2.2. Parity Estimation for Linear Regression

We begin by noting that the Parity Regression (PR), the parity estimation framework for linear regression, shares conceptual similarities with well-known penalised regression methods, as it introduces specific constraints. Figure 1 provides a simple geometric interpretation of PR estimation, reminiscent of the classic geometric representations of RR; for example, see Figure 3.11 in Hastie et al. (2009) or Figure 2 in Tibshirani (1996).
The illustration in Figure 1 provides an intuitive explanation for the presence of a single solution in each quadrant when p = 1 , and the same reasoning extends naturally to cases with p > 1 . Theorem 1 and Proposition 1 provide the theoretical justification for this geometric interpretation.
The PR method builds upon the structure of RR by introducing elasticity constraints that ensure the resulting loss function is homogeneous and convex in a specific form. This can be formalised by defining the total expected loss as
RRSS ( θ ^ ; λ ) : = 1 n i = 1 n θ x ˜ i θ p + 1 y i 2 + λ k = 1 p θ k 2 , = 1 n θ ^ Z Z θ ^ + λ θ ^ diag ( 1 p + 1 , 0 ) θ ,
where θ ^ = ( θ , θ p + 1 ) extends the parameter vector by including the synthetic term θ p + 1 = 1 for the dependent variable, 1 p + 1 is a vector of ones of length p + 1 , and Z is an n × ( p + 2 ) matrix with the ith row given by x ˜ i , y i for all 1 i n . To satisfy the homogeneity condition required by the parity estimator, as stated in Proposition 1, we introduce the synthetic term θ p + 1 = 1 . This ensures that (16) becomes a homogeneous function of order τ = 2 , while maintaining convexity and homogeneity for the extended parameter vector θ ^ K p + 2 ( δ ) . These properties fulfil the necessary conditions for applying parity estimation, thereby allowing us to directly extend parity estimation principles to the PR framework. To ensure that the loss function RRSS behaves as intended, we impose the following conditions, stated formally in Assumption 3.
Assumption 3.
Let Assumption 1 hold. If λ = 0 , then
RRSS ( θ , 1 ) ; 0 > 0 for all ( θ , 1 ) K p + 2 ( δ ) .
Note that Z Z 0 , and therefore, for any given λ > 0 and search cone K p + 2 ( δ ) , we have
RRSS ( θ , 1 ) ; λ > 0 for all ( θ , 1 ) K p + 2 ( δ ) .
This condition may fail only if there exists θ ˜ R p + 1 such that
y i θ ˜ 0 k = 1 p θ ˜ k x i k = 0 for all 1 i n ,
with reference to (1). Specifically, (19) can arise in the following scenarios: (i) an imbalanced regression with n < p + 1 , (ii) strong linear dependence among some features/predictors with n > p + 1 , or (iii) a special data structure when n = p + 1 such that θ ^ O L S = θ ˜ . In these cases, Assumption 3 implies
RRSS ( θ , 1 ) ; 0 = 0 if and only if θ R p + 1 { θ ˜ } .
Our main PR results for this section, namely Theorem 2 and Proposition 2, are now presented. These results establish the existence and uniqueness of the PR estimators under the specified elasticity-based constraints.
Theorem 2.
Let λ 0 and t 0 for which Assumptions 1 and 3 hold.
(i)
For any μ ˜ > 0 , the unconstrained optimisation problem
min ( θ , θ p + 1 ) K p + 2 ( δ ) RRSS ( θ , θ p + 1 ; λ ) μ k = 0 p log ( δ k θ k ) μ t log ( δ p + 1 θ p + 1 )
admits a unique solution, denoted by θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) , which satisfies the parity conditions in (11). This solution yields the PR estimate
θ ^ P R ( λ , t ) = θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) θ p + 1 ( λ , t , μ ) = β ^ P R ( λ , t ) , 1 ,
which is independent of μ > 0 for any fixed ( λ , t ) . Furthermore, we have
RRSS θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = ( p + 1 + t ) μ 2 ,
for any ( λ , t , μ ) R + × [ 0 , ) × R + , and
β ^ P R ( λ , t ) = θ ( λ , t , μ ) = ( μ ) 1 / 2 θ ( λ , t , 1 ) ,
where μ = θ p + 1 ( λ , t , 1 ) 2 . Furthermore, define θ ˜ P R ( λ , t ) : = β ˜ P R ( λ , t ) , 1 as a PR estimate in the search cone K p + 2 ( δ ) , which satisfies (13). Then, θ ˜ P R ( λ , t ) is the unique solution of (20) with μ ˜ = 2 p + 1 + t RRSS β ˜ P R ( λ , t ) , 1 ; λ . Consequently, we have β ^ P R ( λ , t ) = β ˜ P R ( λ , t ) .
(ii)
For any μ ˜ R , the constrained optimisation problem
min ( θ , θ p + 1 ) K p + 2 ( δ ) RRSS θ , θ p + 1 ; λ s . t . k = 0 p log δ k θ k + t log δ p + 1 θ p + 1 μ ˜
admits a unique solution, denoted by θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) . This solution yields the PR estimate
θ ^ ^ P R ( λ , t ) = θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) θ p + 1 ( λ , t , μ ˜ ) = β ^ ^ P R ( λ , t ) , 1 ,
which is independent of μ ˜ for any given ( λ , t ) . Additionally, we have
β ^ ^ P R ( λ , t ) = θ ( λ , t , μ ˜ ) = e μ ˜ p + 1 + t θ ( λ , t , 0 ) ,
where μ ˜ = ( p + 1 + t ) log θ p + 1 ( λ , t , 0 ) , and strong duality holds in (24).
(iii)
For any given values of μ > 0 and μ ˜ R , we have that β ^ P R ( λ , t ) = β ^ ^ P R ( λ , t ) .
Theorem 2 (i) shows that up to 2 p + 1 distinct PR estimates can be identified, each residing in one of the 2 p + 1 possible search cones. This is consistent with Assumption 1, where estimators are sought within specific quadrants of the parameter space. For a given p, the PR estimates can therefore span all possible quadrants, and (20) provides a systematic procedure to obtain each potential PR solution in this setting.
Theorem 2 (ii) represents a constrained form of (i) and constitutes a special case of the framework discussed in Section 2.3. Theorem 2 (iii) establishes the uniqueness of each PR estimate and its independence from the normalising constants μ and μ ˜ , thereby simplifying computation by eliminating the need for cross-validation over these constants. As a result, any suitable computational method, such as (20), produces a stable, single PR estimate within the chosen cone K p + 2 ( δ ) across all parameter quadrants when p > 1 .
We are now ready to present the main result of this section, stated as Proposition 2.
Proposition 2.
Assume that β ^ R R ( λ ) and β ^ O L S contain only non-zero elements. Let
δ R R : = s g n β ^ R R ( λ ) and δ O L S : = s g n β ^ O L S ,
where the signum function is applied componentwise such that s g n ( a ) = 1 and s g n ( a ) = 1 whenever a > 0 and a < 0 , respectively. Thus, β ^ R R ( λ ) K p + 1 δ R R and β ^ O L S K p + 1 δ O L S .
(i)
Let λ > 0 and t 0 for which Assumption 1 holds for K p + 1 δ R R . The PR estimate β ^ P R ( λ , c ) K p + 1 δ R R satisfies
k = 0 p β ^ k P R ( λ , t ) β ^ k R R ( λ ) 1 .
(ii)
Let λ = 0 and t 0 for which Assumption 1 holds for K p + 1 δ O L S . The PR estimate β ^ P R ( 0 , t ) K p + 1 δ O L S satisfies
k = 0 p β ^ k P R ( 0 , t ) β ^ k O L S 1 .
Proposition 2 recommends using either the OLS or RR estimate as a starting point for selecting the search cone in PR estimation. Choosing the cone defined by OLS, θ ˜ O L S = β ^ O L S , 1 = β ^ R R ( 0 ) , 1 , provides a non-regularised baseline. Alternatively, selecting the cone containing the RR estimate, θ ˜ R R = β ^ R R ( λ ) , 1 , offers a regularised approach, where λ is chosen via CV, for example, using the glmnet package in R.

2.3. Parity Estimation and Regression—Further Explainability

We have established the main theory of parity estimation and PR in the previous two sections. We now aim to enhance the explainability of these concepts by providing a higher-level description of the theory and by linking PR to well-known penalised regression methods, such as RR and LASSO. To this end, we introduce the concept of the Generalised Weighted Mean (GWM).
The GWM generalises several well-known averaging operations and depends on a parameter r. For a given x R p + 1 and a vector of weights b = ( b 0 , b 1 , , b p ) , where b i 0 and 1 b = 1 , the GWM of order r is defined as
m r ( x ; b ) = k = 0 p b k | x k | r 1 r , for r R { ± } .
It holds that m r x ; b m s x ; b for all r < s . The limiting case r = 0 , corresponding to the weighted geometric mean, is given by
m 0 x ; b : = lim r 0 m r x ; b = k = 0 p | x k | b k = exp k = 0 p b k log | x k | .
For r = ± , we obtain
m x ; b = min 0 i p | x i | and m x ; b = max 0 i p | x i | .
The case r = 1 yields the weighted harmonic mean, defined as
m 1 x ; b = i = 0 p b i | x i | 1 .
We introduce the Generalised Weighted Mean Constrained (GWMC) estimation framework and note that an equivalent formulation in the context of portfolio theory is discussed in (Asimit et al., forthcoming-a, 2025). The GWMC approach seeks to minimise a given loss function L subject to a constraint on the GWM of the regression parameters:
min θ R p + 1 L θ s . t . m r θ ; b ϵ , for r 1 , m r θ ; b ϵ , for r < 1 ,
where ϵ > 0 is a fixed constant. The GWM function m r ( θ ; b ) provides flexibility in regularising the parameters through the choice of the order r and weighting vector b .
Note that m r ( θ ; b ) is convex in θ K p + 1 ( δ ) when r 1 , and therefore encompasses a broad class of regularisation schemes. In particular, RR and LASSO arise as special cases of (29) when the loss functional L corresponds to a regression setting. Specifically, RR can be formulated within the GWMC framework with r = 2 and θ K p + 1 ( δ ) :
m 2 ( θ ; b ) = k = 0 p b k | θ k | 2 1 2 λ ˜ ,
where equal weights are assumed. This formulation coincides with the usual L 2 -norm constraint in (6). Similarly, LASSO is representable within the GWMC framework with r = 1 , which imposes an L 1 -norm constraint on θ K p + 1 ( δ ) . In this case, the GWMC constraint becomes
m 1 ( θ ; b ) = k = 0 p b k | θ k | t ˜ .
The Parity Estimator introduced in (11) and formulated in (12) can be viewed as a limiting case of the GWMC framework in (29) with r = 0 and θ K q ( δ ) , which leads to a logarithmic constraint. Specifically, for equal weights b k = 1 q , the constraint becomes
m 0 ( θ ; b ) : = lim r 0 m r θ ; b = exp i = 1 q b i log | θ i | e μ ,
where μ is a lower-bound parameter. Similarly, the Partial Parity Estimator defined in (13) and reformulated in (14) can also be interpreted within the GWMC framework with r = 0 and θ K q ( δ ) . In this case, the logarithmic constraint takes the weighted form
m 0 ( θ ; b ) : = lim r 0 m r θ ; b = exp i = 1 q 0 b i log | θ i | + i = q 0 + 1 q b i log | θ i | e μ ˜ ,
where μ ˜ is the corresponding lower-bound parameter. The weights b k are defined as
b k = 1 q 0 + 1 t , for k = 1 , , q 0 , t k q 0 + 1 t , for k = q 0 + 1 , , q .
In summary, the logarithmic constraint underlying parity estimation produces a balanced regularisation effect, distributing the elasticity of the loss function uniformly across the selected parameters. When embedded within the GWMC framework, the parity estimator is conceptually consistent with penalised regression methods such as RR and LASSO, which correspond to the cases where r = 2 and r = 1 , respectively. The fundamental similarity between RR, LASSO and PR lies in their shared objective, to control both the magnitude and distribution of model parameters by imposing constraints on the weighted mean functional. This connection highlights how PR extends the classical regularisation principle by introducing fairness-oriented constraints within a unified GWMC framework.

3. Simulation Study

In this section, we conduct a simulation study to evaluate the finite-sample performance of the PR estimators relative to their primary competitors: (i) OLS as defined in (2), (ii) RR as defined in (5), and (iii) the Liu estimator as defined in (9). Our objective is to assess the robustness of the PR framework under varying degrees of multicollinearity and heteroscedasticity. We exclude the LASSO estimator from the analysis, as our focus is on dense parameter space structures and variable selection is not the purpose of this paper.
This section is organised into two parts. Section 3.1 describes the simulation design and data-generating process, but it also introduces the performance measures used to compare the estimators. Section 3.2 presents and discusses the simulation results.

3.1. Experimental Setup and Methodology

To evaluate the performance of the estimator under different data structures, we employ a data generation process (DGP) designed to simulate complex regression environments characterised by severe multicollinearity and heteroscedasticity. A detailed characterisation of the simulation steps, including the construction of the feature matrix X R n × p and the heteroscedastic response variable Y, is provided in Appendix B.
We compare several distinct estimation methodologies. The OLS estimator serves as the unregularised baseline, while the shrinkage estimators, namely RR and Liu, are also included. The PR framework is implemented in two computationally efficient variants, which differ primarily in their tuning mechanisms. First, the PR estimator with t-tuning, denoted by PR t , is directly motivated by Theorem 2. Its key feature is that, for any fixed t 0 , the PR estimate is independent of the normalising constant μ . Here, t acts as the relative elasticity weight for the target variable and is selected via CV. Second, the PR estimator with c-tuning, denoted by PR c , allocates a fixed loss contribution to each predictor. Mathematically, this corresponds to the budget-based objective function, where the logarithmic barrier for the p predictors is weighted by c, while the response variable’s contribution is determined by 1 ( p + 1 ) c . To ensure positive risk allocations, c is constrained such that 0 c < 1 / ( p + 1 ) . Although c and t are functionally connected through their role in balancing loss contributions, PR c operates within a strictly bounded domain, thereby providing an alternative numerical approach to the risk-parity problem. We benchmark both PR t and PR c against OLS, RR, and the Liu estimator, defining the initial search cone for the PR algorithms using the signs of the OLS coefficients, as indicated by Proposition 2. All tuning parameters (c, t, λ , or d) are selected via 10-fold CV.
To evaluate the accuracy of the proposed estimators in the simulation study, we measure their performance using the L 2 -distance between the true regression parameter vector β and the estimated vector β ^ . This corresponds to the estimated MSE of β , which is appropriate since low estimation error aligns with low theoretical MSE. For a single simulation run, the L 2 -error is defined as
L 2 = β ^ β 2 = j = 1 m ( β ^ j β j ) 2 ,
where m denotes the number of covariates, including the intercept.
We report the average L 2 -distance across N = 1000 repetitions for each scenario. To assess the inherent variability and spread of these distances, we also report the standard deviation (SD). Following the convention in our numerical results, the SD is expressed as a percentage to facilitate comparison of the performance of different estimators and is calculated as follows:
SD = 1 N 1 i = 1 N L 2 ( i ) L ¯ 2 2
where L 2 ( i ) represents the L 2 -distance in the i-th simulation repetition, and L ¯ 2 is the sample mean of these distances across all N repetitions. A lower average L 2 -distance indicates greater accuracy in estimating the true regression parameters.

3.2. Discussion of Simulation Results

We provide a comprehensive discussion of the comparative performance of the two PR variants ( PR c and PR t ) relative to the OLS, RR, and the Liu estimators in Table 1 and Table 2. The detailed results are reported in Table 1, while the aggregated results are summarised in Table 2 to facilitate identification of the main trends observed in Table 1.
Overall, the simulation results demonstrate that PR consistently outperforms OLS and the shrinkage estimators by effectively stabilising parameter estimates in high-correlation environments. While OLS exhibits substantial variance inflation as | ρ | increases, both PR c and PR t achieve their strongest performance under high negative correlation ( ρ = 0.75 and 0.5 ), frequently attaining the lowest L 2 -distances across all panels. Even in settings with strong positive correlation, PR remains superior or highly competitive relative to OLS. Moreover, as dimensionality increases (Panel C, m = 25 ), PR c emerges as the dominant estimator across nearly all correlation levels. These findings confirm that the parity constraint provides an effective regularisation mechanism for high-dimensional models characterised by dense parameter structures and strong interdependence among features. It is important to note that when the number of covariates is very low ( m = 2 ) and the sample size is small ( n / m = 10 and 25), the PR methods sometimes perform worse or similarly to traditional shrinkage methods. In such low-dimensional environments, the risk-balancing mechanism of PR provides less marginal benefit. However, as the dimensionality and sample size increase, the structural advantages of PR become highly evident. While PR consistently outperforms OLS, it is worth noting that the traditional shrinkage estimators (RR and Liu) perform quite similarly to the PR framework in scenarios with a lower number of covariates or moderate volatility. The true divergence in performance occurs in high-dimensional settings (e.g., m = 25 ), where the parity constraints prevent the overshrinkage or instability that affects RR and Liu.
Table 2 provides a synthesised overview of the results reported in Table 1 by aggregating the “best” and “second-best” performances across all 60 scenarios. The summary highlights the consistency of the PR framework: the combined PR approach is the top-performing model in 41.7% of cases and ranks as either the best or second-best estimator in 43.3% of the scenarios considered (52 out of 120).
In summary, the simulation results indicate that the PR estimator is comparable to the benchmark model and performs exceptionally well in settings characterised by severe multicollinearity and high dimensionality. By imposing structural balance on the coefficients via parity constraints, the proposed method achieves a superior bias–variance trade-off compared to traditional shrinkage estimators across various dependency structures.

4. Real Data Analysis

We evaluate the empirical performance of the PR estimators ( PR c , PR t ) using data from the West Texas Intermediate (WTI) and Brent crude oil markets. These commodities serve as the primary global benchmarks for oil pricing and are characterised by high volatility and frequent structural shifts. This environment provides a setting to assess the stability of the parity framework relative to OLS, RR, and the Liu estimator across diverse market regimes. Section 4.1 details the dataset and the underlying factor model. Section 4.2 outlines the OOS evaluation structure and corresponding error metrics before presenting the empirical results.

4.1. Background and Dataset Description

Our analysis utilises monthly total returns for WTI and Brent crude oil sourced from Bloomberg. Although the raw data begin in January 1980, the necessity of observing all factors concurrently restricts our final sample periods to: (i) WTI, August 1988 to September 2024 (434 observations); and (ii) Brent, April 1998 to September 2024 (318 observations). A detailed technical discussion of benchmark selection and data source characteristics is provided in Appendix C.1.
Following the methodology of Sakkas and Tessaromatis (2020), we model monthly excess returns using a factor-based specification. The model is given by
y t + 1 = β 0 + j = 1 p β j X j , t + ϵ t + 1 ,
where y t + 1 denotes the monthly excess return (over the risk-free rate) at time t + 1 , and X j , t represents the j-th normalised factor observed at time t.
For WTI, the model includes nine factors ( p = 9 ): Momentum, Basis, Basis Momentum, Skewness, Inflation Beta, Volatility, Hedging Pressure, Open Interest, and Value. For Brent, Hedging Pressure is excluded due to data limitations, resulting in p = 8 . To guarantee numerical stability and comparability across variables measured in different units, all covariates are uniformly scaled to the interval [ 0 , 1 ] , whereas the response variable y remains unscaled. Definitions and economic motivations for each factor are provided in Appendix C.2, and the explicit construction formulas follow the framework established in Sakkas and Tessaromatis (2020).
Figure A1 and Figure A2 present the monthly returns and cumulative performance of both benchmarks. To account for market shifts over time, we apply the endogenous structural breakpoint test of Bai and Perron (2003) to divide the entire sample into distinct economic periods. This procedure identifies 12 regimes for WTI and 11 for Brent, including major high-volatility episodes such as: (i) the Global Financial Crisis and subsequent recovery, spanning September 2008 to May 2011 for WTI and September 2008 to January 2011 for Brent; and (ii) the COVID-19 pandemic and its aftermath, covering March 2020 to July 2022 for WTI and February 2020 to July 2022 for Brent. For a complete timeline of these identified periods, along with the specific market events that define them, please see the Appendix C.3.

4.2. Data Analysis

The OOS performance of all estimators is evaluated using an expanding training-window framework. This approach is anchored to the structural regimes identified in Section 4.1 and further detailed in Appendix C.3. For each transition from Period i to Period i + 1 , the five regression models are estimated using all available data up to the end of Period i. Their predictive performance is then evaluated over the testing window corresponding to the entirety of Period i + 1 .
Given a testing window of length n, observed excess returns y t , and corresponding predictions y ^ t , OOS performance is assessed using three standard metrics:
RMSE = MSE , MAE = 1 n t = 1 n | y t y ^ t | .
Root Mean Squared Error (RMSE) imposes a quadratic loss, thereby penalising large forecast errors more heavily and highlighting model instability during market shocks. In contrast, the Mean Absolute Error (MAE) applies a linear loss and is therefore more robust to extreme observations. While OLS and shrinkage estimators may experience variance inflation during turbulent market regimes, the PR framework incorporates a parity-based constraint designed to enhance predictive stability. A detailed discussion of the comparative stability of PR relative to penalised methods across these regimes is provided in Appendix C.4.
Before evaluating the OOS forecasting performance, it is important to note the dependence structure among the covariates. An analysis of the overall correlation matrices provided in Appendix C.5 reveals moderate to high multicollinearity among several factors in both the WTI and Brent datasets. This interrelated parameter structure confirms the necessity of employing regularised estimators, including RR, Liu, and PR to stabilise the coefficient estimates and prevent the variance inflation that can degrade the OOS performance of OLS in such conditions.
Table 3 and Table 4 present the OOS predictive performance for WTI and Brent crude oil excess returns across 12 and 11 distinct economic periods, respectively. Although no single estimator consistently outperforms across all periods, the results demonstrate that our PR estimators provide greater stability during regimes of extreme market volatility and structural shifts, where traditional benchmarks often exhibit pronounced instability.
For the WTI dataset, the summary rows indicate that PR c attains the highest number of best and second-best RMSE performances across the evaluated periods. Examining specific high-volatility events, the performance of the estimators varies. During the Global Financial Crisis (Period 6), all methods struggle, with OLS achieving the lowest RMSE of 38.94%; however, PR c secures the second-best overall position, notably outperforming the other shrinkage estimators. This illustrates that even when standard shrinkage methods falter, the parity constraint enhances predictive robustness. In the subsequent recovery phase (Period 7), PR t attains the lowest RMSE of 11.27%, outperforming all other competitors. During the COVID-19 pandemic (Period 11), the parity-based framework demonstrates structural resilience: PR c achieves the minimum RMSE of 28.51%, while PR t ranks second-best at 29.00%, whereas RR, Liu, and OLS fail to maintain comparable stability.
A similar pattern is observed for the Brent dataset, where the summary counts indicate that PR c again secures the highest number of best RMSE performances, while PR t achieves the most second-best rankings. During the Financial Crisis (Period 5), PR c emerges as the most robust estimator, delivering the lowest RMSE of 66.33%. In contrast, traditional shrinkage methods such as RR and Liu suffer from substantial errors, performing even worse than the second-best OLS. During the subsequent recovery phase (Period 6), RR attains the lowest RMSE at 22.98%, with PR c closely following as the second-best model at 26.13%. Most notably, during the extreme volatility of the COVID-19 pandemic (Period 10), PR c proves highly reliable, achieving a remarkably low RMSE of 28.44%. In this extreme regime, conventional methods fail: OLS produces an RMSE of 163.04%, while RR and Liu incur errors of 61.51% and 72.91%, respectively.
Consistent with our findings from the simulation study, the relative performance of PR t and PR c varies across market regimes. While PR t exhibits particular strength in capturing trends during the WTI recovery phase, PR c demonstrates superior robustness in the most volatile and ill-conditioned periods, such as the Financial Crisis and the COVID-19 pandemic, across both commodities. Furthermore, the empirical results confirm our previous observations regarding RR and Liu. During relatively stable market regimes, RR and Liu perform quite competitively and yield similar predictive accuracy to PR. However, during the aforementioned periods of extreme volatility and structural breaks, these estimators frequently become unstable due to their reliance on a single global penalty parameter. By evenly distributing the risk equilibrium, the PR prevents overshrinkage and maintains robust OOS predictions even in these chaotic environments. Overall, the results presented in Table 3 and Table 4 underscore that parity-based regression provides a critical mechanism for ensuring parameter stability, rendering the PR framework a more reliable alternative to conventional shrinkage methods, such as RR or Liu, when forecasting oil returns under severe global shocks.

5. Conclusions

We have introduced Parity Regression, a novel multiple linear regression framework that replaces aggregate error minimisation with an elasticity-based principle that distributes prediction error evenly across model parameters. This formulation induces a structurally balanced regularisation mechanism that is particularly well-suited to noisy environments, including time series settings characterised by structural shifts and evolving dynamics. We provided a rigorous theoretical characterisation of the new estimator, establishing its existence, uniqueness, and structural properties, and demonstrated that it can be embedded within a unified Generalised Weighted Mean Constrained framework that also encompasses classical penalised and shrinkage estimators.
By reinterpreting regularisation through elasticity balancing rather than norm penalisation alone, Parity Regression offers a conceptually distinct yet mathematically coherent extension of existing methodology. Theoretical guarantees are corroborated by simulation studies and real-data applications, which confirm its stability and competitive performance. Collectively, these results position Parity Regression as a substantive methodological advancement with strong foundations for further analytical and high-dimensional development.

Author Contributions

Conceptualization, V.A. and Z.C.; Methodology, V.A., Z.C. and P.M.; Software, Z.C. and B.I.; Validation, V.A., B.I. and P.M.; Formal analysis, V.A.; Investigation, B.I.; Resources, B.I.; Data curation, Z.C.; Writing – original draft, V.A., Z.C. and P.M.; Visualization, Z.C.; Supervision, V.A. and P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to proprietary licensing restrictions from the third-party provider (Bloomberg Finance L.P.). The data were accessed via a university institutional terminal, which prohibits the redistribution of raw data to the public.

Acknowledgments

The authors express their sincere gratitude to Alexandru Bădescu (University of Calgary) for his valuable comments and constructive guidance throughout the theoretical development and implementation phases of this research. His insights have materially contributed to the rigour and clarity of the final manuscript. The third author would like to thank the Informational Buildup Foundation (IBF) and the Simion Stoilow Institute of Mathematics of the Romanian Academy for the support provided through an IBF Research Fellowship.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs

Appendix A.1. Proof of Theorem 1

We first prove part (i). The first-order necessary conditions for any solution θ of (12) leads to
θ k L θ k ( θ ) = μ , for all 1 k q ,
which further implies that θ satisfies (11).
For part (ii), we first demonstrate that (12) has a solution. Let f ( θ ; μ ) be the objective function of (12). We begin by showing that
lim θ f θ ; μ = + for any μ > 0 .
Indeed,
f θ ; μ = L ( θ ) μ k = 1 q log δ k θ k M 1 θ μ k = 1 q log δ k θ k θ μ q log θ for θ is sufficiently large M 1 θ μ q log θ as θ .
Next, we show that
lim θ 0 f θ ; μ = + , for any μ > 0 ,
where θ = min i | θ i | . This follows from
lim θ 0 k = 1 q log δ k θ k =
and the fact that L ( θ ) > 0 for any θ K q ( δ ) . Thus, there exists ϵ > 0 and K > 0 such that
inf θ K q ( δ ) f θ ; μ = inf θ B ϵ , K f θ ; μ ,
where
B ϵ , K = { θ K q ( δ ) : θ ϵ , θ K } .
The conclusion follows since B ϵ , K is compact and f is continuous.
Finally, note that f ( θ ; μ ) is strictly convex in θ since μ > 0 , L is convex and log ( δ k θ k ) is strictly concave in θ k for all k. Hence, (12) has a unique solution.
For part (iii), we begin by proving that (12) admits a unique solution. Since the proof is the same as part (ii) except for (A1), we only show that (A1) holds. Since L is positive and homogeneous of order τ , then for any 0 < ζ < τ , we have that L ( θ ) θ τ ζ for those θ such that θ is sufficiently large. Thus,
f θ ; μ = L ( θ ) μ k = 1 q log δ k θ k C θ τ μ q log θ for θ is sufficiently large as θ ,
which gives the needed result.
We now show that L ( θ ( μ ) ) = q μ τ for all μ > 0 . Since θ ( μ ) is a solution of (12), the first-order conditions give
θ k L ( θ ) θ k = μ | θ = θ ( μ ) for any 1 k q
The latter and Euler’s Homogeneous Function Theorem yield L ( θ ( μ ) ) = q μ / τ .
Next, we show that θ ( μ ) = μ 1 / τ θ ( 1 ) for all μ > 0 . The homogeneity of L implies that
f μ 1 / τ θ ; 1 = 1 μ f ( θ ; μ ) + q τ log μ for any θ K q ( δ ) and μ > 0 ,
which further leads to
argmin θ f μ 1 / τ θ ; 1 = argmin θ f ( θ ; μ ) = θ ( μ ) ,
as μ > 0 . It can easily be seen that
μ 1 τ θ ( 1 ) = argmin θ f μ 1 / τ θ ; 1
which shows that θ ( μ ) = μ 1 / τ θ ( 1 ) .
Lastly, we prove the final statement of part (iii). If θ ˜ = μ 0 1 / τ θ ( 1 ) for some μ 0 > 0 then θ ˜ = θ ( μ 0 ) and by part (i), θ ˜ is a parity estimator. Conversely, assume that θ ˜ is a parity estimator so that there exists μ 0 > 0 such that
θ k L ( θ ) θ k | θ = θ ˜ = μ 0 for all 1 k q .
Euler’s Theorem gives L ( θ ˜ ) = q μ 0 / τ = L θ ( μ 0 ) . Thus, θ ˜ = θ ( μ 0 ) = μ 0 1 / τ θ ( 1 ) by the uniqueness of the solution of (12). The proof is now complete.

Appendix A.2. Proof of Theorem 2

We begin by proving part (i). Based on the proof of Theorem 1, the optimisation problem in (20) has a unique solution ( θ ( λ , t , μ ) ) , θ p + 1 ( λ , t , μ ) for any parameters ( λ , t , μ ) satisfying λ 0 , t 0 , and μ > 0 . This solution also fulfils the parity conditions specified in (11). Define RRSSC k for each k as follows
RRSSC k θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = θ k RRSS θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ θ k | θ = θ ( λ , t , μ ) ,
for all 0 k p + 1 . Since ( θ ( λ , t , μ ) ) , θ p + 1 ( λ , t , μ ) is a unique solution, it must satisfy the stationary conditions
RRSSC k θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = μ , for k = 0 , , p , RRSSC k θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = μ t , for k = p + 1 .
Applying Euler’s homogeneous function theorem, and RRSS being a homogeneous function of order τ = 2 , we can express
RRSS θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = 1 2 k = 0 p + 1 RRSSC k θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = 1 2 k = 0 p μ + μ t = ( p + 1 + t ) μ 2 .
We find that for each 0 k p ,
RRSSC k θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = 2 p + 1 + t RRSS θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ .
Since RRSS ( θ , θ p + 1 ; λ ) is homogeneous, RRSSC k is also homogeneous of the same order, and in turn, the following holds for all 0 k p and any m > 0
RRSSC k m θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ = 2 RRSS m θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) ; λ p + 1 + t .
Setting m = 1 / θ p + 1 ( λ , t , μ ) , we conclude that ( θ ( λ , t , μ ) ) , θ p + 1 ( λ , t , μ ) / θ p + 1 ( λ , t , μ ) is the unique PR estimate as defined in (21) within K p + 2 ( δ ) , thereby satisfying the parity estimator in (13).
To show that ( θ ( λ , t , μ ) ) , θ p + 1 ( λ , t , μ ) / θ p + 1 ( λ , t , μ ) is constant with respect to μ > 0 for any given ( λ , t ) , consider the objective function of (20), denoted by H θ , θ p + 1 ; λ , t , μ . We find that
H μ 1 / 2 θ , μ 1 / 2 θ p + 1 ; λ , t , 1 = 1 μ H θ , θ p + 1 ; λ , t , μ p + 1 + t 2 log μ ,
which implies
argmin ( θ , θ p + 1 ) H μ 1 / 2 θ , μ 1 / 2 θ p + 1 ; λ , t , 1 = argmin ( θ , θ p + 1 ) H θ , θ p + 1 ; λ , t , μ ,
yielding a unique solution θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) for μ > 0 and t 0 . Since
argmin ( y , y p + 1 ) f y , y p + 1 ; λ , t , 1 = θ ( λ , t , 1 ) , θ p + 1 ( λ , t , 1 ) ,
where θ ( λ , t , 1 ) , θ p + 1 ( λ , t , 1 ) is the optimal solution in (20) with μ ˜ = 1 . Together with (A6), we obtain
θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) = μ 1 / 2 θ ( λ , t , 1 ) , θ p + 1 ( λ , t , 1 ) for any μ > 0 ,
Thus,
θ ( λ , t , μ ) , θ p + 1 ( λ , t , μ ) θ p + 1 ( λ , t , μ ) = θ ( λ , t , 1 ) , θ p + 1 ( λ , t , 1 ) θ p + 1 ( λ , t , 1 ) for any μ > 0 ,
showing that our PR estimates do not depend on the normalising constant μ . Choosing μ > 0 such that θ p + 1 ( λ , t , μ ) = 1 , then (A7) leads to the required result displayed in (23). Furthermore, (A7) and (A8) yield
θ ( λ , t , μ ) , 1 = μ 1 / 2 θ ( λ , t , 1 ) , θ p + 1 ( λ , t , 1 ) = ( θ ( λ , t , 1 ) ) , θ p + 1 ( λ , t , 1 ) θ p + 1 ( λ , t , 1 ) ,
which concludes that μ = θ p + 1 ( λ , t , 1 ) 2 . The rest of the proof of part (i) is straightforward, since it relies on arguments similar to those above.
Next, we show the proof of part (ii). We begin by noting that (24) is a convex optimisation problem because the objective function RRSS ( θ , θ p + 1 ; λ ) is strictly convex in ( θ , θ p + 1 ) . To establish the existence of a solution, we observe that: (i) the objective function grows unboundedly near infinity (as shown in the proof of Theorem 1), and (ii) any point ( θ , θ p + 1 ) K p + 2 ( δ ) becomes infeasible when θ k ϵ for sufficiently small ϵ , since lim t 0 + log t = . This implies that the feasible set K p + 2 ( δ ) excludes boundary points where any θ k approaches zero, effectively restricting the solution to a bounded subset of K p + 2 ( δ ) . Given this compactness and the strict convexity of RRSS ( θ , θ p + 1 ; λ ) , a solution is guaranteed to exist.
We now proceed to show that (24) admits a unique solution, which requires two main steps.
First, assume that z * T , z p + 1 is an optimal solution of (24) for which the inequality constraint in (24) is binding (becomes an identity). If the constraint would not have been binding, we define
κ = exp μ ˜ k = 0 p log δ k z k t log δ p + 1 z p + 1 ,
and note that 0 < κ < 1 , as the constraint would be non-binding at z * , z p + 1 . Then,
RRSS κ z , κ z p + 1 ; λ = κ 2 RRSS z , z p + 1 ; λ < RRSS z , z p + 1 ; λ ,
where the first equality is due to the homogeneity of order 2 of RRSS ( · ; λ ) on K p + 2 ( δ ) , and the inequality follows from 0 < κ < 1 , (17), and (18). This contradicts our assumption that z * T , z p + 1 is a solution of (24), implying that any solution of (24) must satisfy the constraint as an identity.
Next, assume that there exist two distinct solutions, z * T , z p + 1 and z * * T , z p + 1 , for a given tuple ( λ , t , μ ˜ ) . Define
z , z p + 1 = γ z * T , z p + 1 + ( 1 γ ) z * * T , z p + 1 where 0 < γ < 1 .
Since (24) is a convex problem, z , z p + 1 solves (24). Now, we find
k = 0 p log δ k z k + t log δ p + 1 z p + 1 > γ k = 0 p log δ k z k + t log δ p + 1 z p + 1 + ( 1 γ ) k = 0 p log δ k z k + t log δ p + 1 z p + 1 = μ ˜ ,
since log ( · ) is strictly concave on R + and both z * T , z p + 1 and z * * T , z p + 1 satisfy the constraint as an identity. However, since z , z p + 1 solves (24), it must also satisfy the constraint as an identity, leading to a contradiction with (A9). This completes the proof of uniqueness for (24).
Further, by Euler’s homogeneous function theorem and the Karush–Kuhn–Tucker (KKT) conditions, we have
RRSSC k θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) ; λ = 2 p + 1 + t RRSS θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) ; λ ,
which, together with the equivalent variant of (A5), implies that
θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) / θ p + 1 λ , t , μ ˜
is the unique PR estimate as defined in (25) within K p + 2 ( δ ) , fulfilling the condition in (13).
We now demonstrate that ( θ ( λ , t , μ ˜ ) ) , θ p + 1 ( λ , t , μ ˜ ) / θ p + 1 λ , t , μ ˜ is constant with respect to μ ˜ R for any fixed ( λ , t ) . For any tuple ( λ , t , μ ˜ ) , the unique solution of (24), denoted by ( θ ( λ , t , μ ˜ ) ) , θ p + 1 ( λ , t , μ ˜ ) , is equivalent to solving
min ( θ , θ p + 1 ) K p + 2 ( δ ) RRSS e μ ˜ p + 1 + t θ , e μ ˜ p + 1 + t θ p + 1 ; λ s . t . k = 0 p log δ k e μ ˜ p + 1 + t θ k + t log δ p + 1 e μ ˜ p + 1 + t θ p + 1 0
This problem admits a unique solution
e μ ˜ p + 1 + t θ ( λ , t , 0 ) , e μ ˜ p + 1 + t θ p + 1 ( λ , t , 0 )
for μ ˜ = 0 and t 0 . The uniqueness of the solution for (24) and (A10) implies that
θ λ , t , μ ˜ , θ p + 1 λ , t , μ ˜ = e μ ˜ p + 1 + t θ ( λ , t , 0 ) , θ p + 1 ( λ , t , 0 ) for all μ ˜ R .
Thus,
θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) θ p + 1 ( λ , t , μ ˜ ) = θ ( λ , t , 0 ) , θ p + 1 ( λ , t , 0 ) θ p + 1 ( λ , t , 0 ) for all μ ˜ R .
This confirms that PR estimates are independent of the normalising constant μ ˜ . By selecting μ ˜ so that θ p + 1 ( λ , t , μ ˜ ) = 1 , we arrive at the desired result in (26), which is a straightforward consequence of (A11). Moreover, (A11) and (A12) give that
θ ( λ , t , μ ˜ ) , 0 = e μ ˜ p + 1 + t θ ( λ , t , 0 ) , θ p + 1 ( λ , t , 0 ) = θ ( λ , t , 0 ) , θ p + 1 ( λ , t , 0 ) θ p + 1 ( λ , t , 0 ) ,
which concludes that μ ˜ = ( p + 1 + t ) log θ p + 1 ( λ , t , 0 ) . Further, we note that the Slater’s condition is satisfied in (24), and therefore, the strong duality in (24) holds.
Finally, we proceed with the proof of part (iii). Using the notation introduced in parts (i) and (ii), recall that the PR estimates in (21) and (25) satisfy
β ^ P R ( λ , t ) = θ ( λ , t , μ ) θ p + 1 ( λ , t , μ ) and β ^ ^ P R ( λ , t ) = θ ( λ , t , μ ˜ ) θ p + 1 ( λ , t , μ ˜ )
for any μ > 0 and μ ˜ R . Since strong duality holds in (24), let γ be the dual optimal multiplier in (24) associated with the logarithmic constraint
k = 0 p log δ k θ k + t log θ p + 1 μ ˜ .
Then, (24) is equivalent to minimising the Lagrangian
L ( θ , θ p + 1 ; λ , t , μ ˜ ; γ ) = RRSS θ , θ p + 1 ; λ γ k = 0 p log δ k θ k + t log δ p + 1 θ p + 1 μ ˜ ,
and the KKT conditions imply that γ = γ , ensuring that the constraint is active, i.e.,
k = 0 p log δ k θ k + t log δ p + 1 θ p + 1 = μ ˜ .
Furthermore, the stationarity conditions for (A14) yield
RRSSC k θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) ; λ = γ , for k = 0 , , p , RRSSC k θ ( λ , t , μ ˜ ) , θ p + 1 ( λ , t , μ ˜ ) ; λ = γ t , for k = p + 1 .
Therefore, (A4) and (A15) imply that solving the primal in (24) is equivalent to solving (20) with μ = γ . There exists α > 0 such that
( θ ( λ , t , μ ) ) , θ p + 1 ( λ , t , μ ) = α ( θ ( λ , t , μ ˜ ) ) , θ p + 1 ( λ , t , μ ˜ )
for any μ > 0 and μ ˜ R . Equations (A13) and (A16) conclude that β ^ P R ( λ , t ) = β ^ ^ P R ( λ , t ) . The proof is now complete.

Appendix A.3. Proof of Proposition 2

The proofs of parts (i) and (ii) are very similar, and thus, we only show part (i). Theorem 2 (i) tells us that there exists μ > 0 such that ( β ^ P R ( λ , t ) ) , 1 uniquely solves (20) with μ = μ , and in turn we have that
RRSS β ^ P R ( λ , t ) , 1 ; λ μ k = 0 p log δ k β ^ k P R ( λ , t ) < RRSS β ^ R R ( λ ) , 1 ; λ μ k = 0 p log δ k β ^ k R R ( λ ) .
Consequently,
0 RRSS β ^ P R ( λ , c ) , 1 ; λ RRSS β ^ R R ( λ ) , 1 ; λ < μ k = 0 p log β ^ k P R ( λ , t ) β ^ k R R ( λ ) ,
where the first inequality is due to (4), and in turn we could conclude (27) as μ , t 0 . The proof of part (i) is now complete, which completes the entire proof.

Appendix B. Data Generation Process for Synthetic Data

This section provides a detailed description of the DGP employed in the numerical experiments presented in Section 3. The design aims to assess the performance of the various estimators under controlled levels of multicollinearity and heteroscedasticity. For reproducibility, the procedure is structured into three main steps, which are outlined below.
Step 1: Feature Generation
(i)
Correlation Matrix Construction: We construct a symmetric correlation matrix Σ R m × m for m features. The elements are defined as Σ i j = ρ | i j | for i j and Σ i i = 1 otherwise, simulating multicollinearity. The correlation parameter ρ is varied across { 0.75 , 0.5 , 0 , 0.5 , 0.75 } to assess the models under different levels of association.
(ii)
Feature Vector Simulation: Feature vectors X i are generated from a multivariate normal distribution N ( 0 , Σ ) . Each feature is subsequently standardised to ensure zero mean and unit variance for each individual feature.
Step 2: Response Variable Generation
(i)
Regression Parameters: We generate the true regression parameters β where each component is defined as β k = ( 1 ) k k / 2 for k = 1 , , p . This introduces a diverse range of predictor effects to test the robustness of the estimators.
(ii)
Response Simulation: For each observation i, we compute the linear predictor η i = X i T β . The response variable Y i is then simulated from a univariate Gaussian distribution N ( η i , σ i 2 ) .
(iii)
Heteroscedasticity and Normalisation: To incorporate heteroscedasticity, the standard error σ i is drawn from an absolute normal distribution with a mean of 10 and a standard deviation of 1. Finally, the response variable Y i is normalised to ensure a zero mean across the entire dataset.
Step 3: Experimental Scale and Repetitions
(i)
Dimensions and Sample Sizes: The simulation considers varying numbers of covariates m = p + 1 { 2 , 10 , 25 } . For each m, the sample size n is determined by specific ratios of the sample size to the number of covariates, namely n / m { 10 , 25 , 50 , 100 } .
(ii)
Statistical Reliability: To ensure the statistical significance of the reported results, all quantities and performance metrics are computed based on n = 1000 samples and N = 1000 independent repetitions for each scenario.

Appendix C. Description of WTI and Brent Data

This section provides additional technical details regarding the data sources, the rationale for selecting specific oil benchmarks, the granular timeline of the structural segments used in the empirical analysis, and the overall correlation matrices illustrating the dependence structure among the covariates.

Appendix C.1. Market Selection and Data Sources

The selection of WTI and Brent is motivated by their established status as the primary global price benchmarks for crude oil. WTI serves as the underlying commodity for the New York Mercantile Exchange (NYMEX) futures and reflects North American supply–demand dynamics. Brent, traded on the Intercontinental Exchange (ICE), acts as the benchmark for roughly two-thirds of the world’s internationally traded physical crude oil. Both benchmarks are widely adopted in the commodity literature for analysing risk premiums, price discovery, and factor-based investment strategies (Bakshi et al. 2019; Sakkas and Tessaromatis 2020).
The monthly data, spanning January 1980 to September 2024, were sourced from Bloomberg. This includes front-month futures prices used to calculate excess returns, a standard proxy for commodity investment performance in empirical finance (Yang 2013), as well as various market-based and liquidity-based metrics used to construct the predictive factors (Hong and Yogo 2012). Although data for WTI has been available since 1983, in order to meet the liquidity requirements for including Brent, the starting date must be set at April 1998 to ensure that fully balanced panel data can be formed across all covariates.

Appendix C.2. Detailed Factor Background

The factors employed in our analysis are calculated based on methodologies established in the commodity investing literature, particularly those developed for identifying priced risk premia in futures markets (Sakkas and Tessaromatis 2020). These factors capture distinct dimensions of the commodity risk premium and are described below.
Inventory and Term Structure: Factors such as Basis (slope of the term structure) and Basis Momentum (change in basis) capture the “roll yield” associated with the shape of the futures curve. The Basis factor is rooted in the Theory of Storage (Working 1949) and the Hedging Pressure Hypothesis (Keynes 1930), identifying backwardation and contango as fundamental drivers of expected returns. Basis Momentum, defined as the difference between the momentum signals of the first and second nearby contracts, provides compensation for commodity volatility and curve dynamics (Boons and Prado 2019).
Trend and Sentiment: Momentum (past returns) captures the tendency of commodity returns to persist over a 12-month horizon (Miffre and Rallis 2007). Hedging Pressure (commercial vs. non-commercial positioning) and Open Interest (market liquidity) reflect the positioning of commercial hedgers relative to speculators and the overall risk absorption capacity of the market (Hong and Yogo 2012). Based on the theory of normal backwardation (Keynes 1930), these factors proxy for the risk premium demanded by speculators for providing insurance to producers (Bessembinder 1992). Due to data limitations, Hedging Pressure is omitted for Brent as the Commodity Futures Trading Commission (CFTC) reports primarily cover US-based exchanges like NYMEX.
Risk, Value, and Macro: Skewness (third moment of returns) and Volatility (return dispersion) address non-normal return distributions and “fat tails” typical of energy markets, capturing compensation for jump and variance risks (Fernandez-Perez et al. 2018). Inflation Beta (sensitivity to inflation) captures the historically documented role of commodities as a hedge against unexpected rising price levels (Gorton and Rouwenhorst 2006). Finally, the Value (spot-to-long-term-average ratio) factor identifies long-term mean reversion by comparing current prices to their five-year historical average (Asness et al. 2013).

Appendix C.3. Detailed Structural Breakdown of Testing Periods

We applied the Bai and Perron (2003) test to the monthly returns to identify the structural breakpoints that define the OOS periods, as shown in Figure A1 and Figure A2. This method ensures that the data intervals align with actual changes in market volatility and average returns. Table A1 provides an overview of the 12 periods for WTI and the 11 periods for Brent, with specific annotations for key market events, including periods of major crisis.
Figure A1. Time series of WTI crude oil performance. Notes. This figure depicts the monthly excess returns (A) and accumulated performance (B) of WTI oil from August 1988 to September 2024. The solid red vertical lines indicate the dataset boundaries. The dashed orange vertical lines depict structural breakpoints identified by the Bai and Perron (2003) test, highlighting major economic or market events used for predictive segmentation.
Figure A1. Time series of WTI crude oil performance. Notes. This figure depicts the monthly excess returns (A) and accumulated performance (B) of WTI oil from August 1988 to September 2024. The solid red vertical lines indicate the dataset boundaries. The dashed orange vertical lines depict structural breakpoints identified by the Bai and Perron (2003) test, highlighting major economic or market events used for predictive segmentation.
Risks 14 00094 g0a1
Figure A2. Time series of Brent crude oil performance. Notes. This figure depicts the monthly excess returns (A) and accumulated performance (B) of Brent oil from April 1998 to September 2024. The solid red vertical lines indicate the dataset boundaries. The dashed orange vertical lines represent key structural breakpoints identified by the Bai and Perron (2003) test used to define periods for predictive modelling.
Figure A2. Time series of Brent crude oil performance. Notes. This figure depicts the monthly excess returns (A) and accumulated performance (B) of Brent oil from April 1998 to September 2024. The solid red vertical lines indicate the dataset boundaries. The dashed orange vertical lines represent key structural breakpoints identified by the Bai and Perron (2003) test used to define periods for predictive modelling.
Risks 14 00094 g0a2
Table A1. Detailed segmented periods for WTI and Brent with key market events.
Table A1. Detailed segmented periods for WTI and Brent with key market events.
WTI PeriodsBrent Periods
PeriodStartEndKey Market EventStartEndKey Market Event
1August 1988December 19901990 Supply ShockApril 1998August 2000Asian Financial Crisis
2January 1991October 1993Early 90s OversupplySeptember 2000April 2003Early 2000s Recession
3November 1993January 1997Mid-90s ExpansionMay 2003August 2005China Demand Boom
4February 1997August 2000Asian Financial CrisisSeptember 2005August 2008Pre-GFC Commodity Supercycle
5September 2000August 20082000s Commodity SupercycleSeptember 2008January 2011Global Financial Crisis
6September 2008May 2011Global Financial CrisisFebruary 2011May 2014Post-Crisis Recovery
7Jun 2011Jun 2014Post-Crisis RecoveryJun 2014Mar 20162014–16 Price Collapse
8Jul 2014Mar 20162014–16 Price CollapseApril 2016September 2018OPEC+ Production Cuts
9April 2016September 2018OPEC+ Production CutsOctober 2018January 2020US–China Trade Tension
10October 2018February 2020US–China Trade TensionFebruary 2020Jul 2022COVID-19 Pandemic
11Mar 2020Jul 2022COVID-19 PandemicAugust 2022September 2024Post-Pandemic
12August 2022September 2024Post-Pandemic
Notes. This table lists the segmented time periods identified by the Bai and Perron (2003) structural breakpoint test for both WTI and Brent crude oil datasets. Major global economic events impacting the volatility and price structure of both benchmarks are highlighted in bold. For each OOS iteration, models are trained on all data prior to the start of the current period to evaluate predictive accuracy in the subsequent regime.

Appendix C.4. Regime Characteristics and Model Stability

The structural periods we have identified encompass a variety of volatility regimes, providing a practical approach for testing both traditional and regularised estimators. The results indicate that the best-performing model depends to a large extent on the macroeconomic environment. During periods of stability or moderate growth, such as the Mid-90s Expansion (WTI Period 3), the 2000s Commodity Supercycle (WTI Period 5) and the Pre-GFC Commodity Supercycle (Brent Period 4), OLS often produces highly competitive or even superior forecasting results. As factor correlations remain relatively stable in these calm markets, standard OLS estimators can capture the data structure without the bias introduced by shrinkage methods.
However, during periods of high market volatility, OLS typically suffers from severe variance inflation. For example, during the OPEC+ Production Cuts (WTI, Period 9) and the COVID-19 pandemic (Brent, Period 10), the breakdown of historical factor relationships led to significant OOS forecasting errors in OLS. Although traditional shrinkage models such as RR and Liu outperform OLS under these conditions, they remain relatively sensitive to sudden spikes in volatility. As these methods employ a single penalty parameter to shrink all coefficients, sudden shocks to specific factors may cause the model to overshrink important signals or produce unstable estimates. This sensitivity often undermines OOS forecasting performance, as exemplified during the Global Financial Crisis (Brent, Period 5). At that time, the RR and Liu models underperformed even OLS due to an overreaction to market noise.
On the other hand, the PR framework, especially PR c , maintains predictive stability through its explicit parity constraint. By focusing on risk distribution rather than just the overall size of the parameters, PR is less exposed to the volatility spikes that often disrupt standard penalised models. Instead of simply shrinking coefficients like RR or Liu, PR ensures that no single factor takes over the model’s overall risk profile. This prevents overfitting to market noise or extreme events, such as the severe negative WTI returns in April 2020 (WTI, Period 11) or the 2014–2016 Price Collapse (Brent, Period 7). This type of regularisation is highly effective in unstable markets, as the parity algorithm keeps predictions stable even when historical correlations break down entirely.

Appendix C.5. Correlation Matrices for Real Datasets

This section provides the overall Pearson correlation matrices for the covariates used in our empirical analysis to check for multicollinearity. We calculate these matrices using the entire pooled sample period rather than individual rolling windows. Table A2 presents the 9 × 9 correlation matrix for the WTI dataset, and Table A3 presents the 8 × 8 correlation matrix for the Brent dataset.
Specifically, Table A2 reveals dependencies in the WTI dataset, such as a high positive correlation between Momentum (Cov 1) and Basis Momentum (Cov 3) at 0.621, and moderate negative correlations between Momentum and Value (Cov 9) at −0.436. Similarly, Table A3 also shows multicollinearity in the Brent dataset, notably the strong relationship between Momentum and Basis Momentum (0.612) and a moderate positive correlation between Inflation Beta (Cov 5) and Value (Cov 8) at 0.499. These interrelated parameter structures justify our application of regularised estimators to stabilise predictions.
Table A2. Overall correlation matrix for WTI covariates.
Table A2. Overall correlation matrix for WTI covariates.
Cov 1Cov 2Cov 3Cov 4Cov 5Cov 6Cov 7Cov 8Cov 9
Cov 11.000−0.3980.621−0.009−0.047−0.0480.057−0.032−0.436
Cov 2−0.3981.000−0.3780.2290.200−0.0400.072−0.0100.191
Cov 30.621−0.3781.0000.0720.0380.0600.150−0.037−0.114
Cov 4−0.0090.2290.0721.000−0.1970.058−0.0520.0050.098
Cov 5−0.0470.2000.038−0.1971.0000.0260.195−0.0050.107
Cov 6−0.048−0.0400.0600.0580.0261.000−0.047−0.0160.056
Cov 70.0570.0720.150−0.0520.195−0.0471.000−0.0150.250
Cov 8−0.032−0.010−0.0370.005−0.005−0.016−0.0151.000−0.001
Cov 9−0.4360.191−0.1140.0980.1070.0560.250−0.0011.000
Notes. The table presents the Pearson correlation matrix for the nine covariates used in the WTI dataset across the entire pooled sample period. The covariates are defined as follows: Cov 1 = Momentum, Cov 2 = Basis, Cov 3 = Basis Momentum, Cov 4 = Skewness, Cov 5 = Inflation Beta, Cov 6 = Volatility, Cov 7 = Hedging Pressure, Cov 8 = Open Interest, Cov 9 = Value.
Table A3. Overall correlation matrix for Brent covariates.
Table A3. Overall correlation matrix for Brent covariates.
Cov 1Cov 2Cov 3Cov 4Cov 5Cov 6Cov 7Cov 8
Cov 11.000−0.4340.612−0.107−0.0840.0240.004−0.431
Cov 2−0.4341.000−0.4760.2040.0930.0230.0250.210
Cov 30.612−0.4761.0000.0200.068−0.0020.033−0.083
Cov 4−0.1070.2040.0201.000−0.175−0.014−0.0040.195
Cov 5−0.0840.0930.068−0.1751.000−0.011−0.0050.499
Cov 60.0240.023−0.002−0.014−0.0111.000−0.032−0.010
Cov 70.0040.0250.033−0.004−0.005−0.0321.000−0.009
Cov 8−0.4310.210−0.0830.1950.499−0.010−0.0091.000
Notes. The table presents the Pearson correlation matrix for the eight covariates used in the Brent dataset across the entire pooled sample period. The covariates are defined as follows: Cov 1 = Momentum, Cov 2 = Basis, Cov 3 = Basis Momentum, Cov 4 = Skewness, Cov 5 = Inflation Beta, Cov 6 = Volatility, Cov 7 = Open Interest, Cov 8 = Value.

Note

1
Available at https://cran.r-project.org/web/packages/savvyPR/index.html (accessed on 17 March 2026).

References

  1. Asimit, Alexandru V., Edward Furman, Qihe Tang, and Raluca Vernic. 2011. Asymptotics for risk capital allocations based on conditional tail expectation. Insurance: Mathematics and Economics 49: 310–24. [Google Scholar] [CrossRef]
  2. Asimit, Alexandru V., Raluca Vernic, and Ričardas Zitikis. 2013. Evaluating risk measures and capital allocations based on multi-losses driven by a heavy-tailed background risk: The multivariate pareto-ii model. Risks 1: 14–33. [Google Scholar] [CrossRef]
  3. Asimit, Vali, Liang Peng, Radu Tunaru, and Feng Zhou. Forthcoming-a. Risk Budgeting Under General Risk Measures. Available online: https://openaccess.city.ac.uk/id/eprint/33733/ (accessed on 5 March 2026).
  4. Asimit, Vali, Liang Peng, Ruodu Wang, and Alex Yu. 2019. An efficient approach to quantile capital allocation and sensitivity analysis. Mathematical Finance 29: 1131–56. [Google Scholar] [CrossRef]
  5. Asimit, Vali, Marina Anca Cidota, Ziwei Chen, and Jennifer Asimit. Forthcoming-b. Slab and Shrinkage Linear Regression Estimation. Available online: https://openaccess.city.ac.uk/id/eprint/35005/ (accessed on 5 March 2026).
  6. Asimit, Vali, Wing Fung Chong, Radu Tunaru, and Feng Zhou. 2025. Portfolio selection and risk sharing via risk budgeting. Insurance: Mathematics and Economics 125: 103139. [Google Scholar] [CrossRef]
  7. Asimit, Vali, Ziwei Chen, and Nathan Lassance. Forthcoming-c. Distribution-free shrinkage of high-dimensional mean vector. Journal of Business & Economic Statistics. [Google Scholar]
  8. Asness, Clifford S., Tobias J. Moskowitz, and Lasse Heje Pedersen. 2013. Value and momentum everywhere. The Journal of Finance 68: 929–85. [Google Scholar] [CrossRef]
  9. Bai, Jushan, and Pierre Perron. 2003. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18: 1–22. [Google Scholar] [CrossRef]
  10. Bakshi, Gurdip, Xiaohui Gao, and Alberto G. Rossi. 2019. Understanding the sources of risk underlying the cross section of commodity returns. Management Science 65: 459–954. [Google Scholar] [CrossRef]
  11. Bessembinder, Hendrik. 1992. Systematic risk, hedging pressure, and risk premiums in futures markets. The Review of Financial Studies 5: 637–67. [Google Scholar] [CrossRef]
  12. Bodnar, Olha, Taras Bodnar, and Nestor Parolya. 2022. Recent advances in shrinkage-based high-dimensional inference. Journal of Multivariate Analysis 188: 104826. [Google Scholar] [CrossRef]
  13. Boons, Martijn, and Melissa Porras Prado. 2019. Basis momentum. The Journal of Finance 74: 239–79. [Google Scholar] [CrossRef]
  14. Chen, Shaobing Scott, and David L. Donoho. 1994. Basis pursuit. Paper presented at the 1994 28th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, October 31–November 2, vol. 1, pp. 41–44. [Google Scholar]
  15. Fernandez-Perez, Adrian, Bart Frijns, Ana-Maria Fuertes, and Joelle Miffre. 2018. The skewness of commodity futures returns. Journal of Banking & Finance 86: 143–58. [Google Scholar] [CrossRef]
  16. Gauss, Carl Friedrich. 1821. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Göttingen: Henricus Dieterich. [Google Scholar]
  17. Gorton, Gary, and K. Geert Rouwenhorst. 2006. Facts and fantasies about commodity futures. Financial Analysts Journal 62: 47–68. [Google Scholar] [CrossRef]
  18. Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin/Heidelberg: Springer, vol. 2. [Google Scholar]
  19. Hoerl, Arthur E., and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. [Google Scholar] [CrossRef]
  20. Hong, Harrison, and Motohiro Yogo. 2012. What does futures market interest tell us about the macroeconomy and asset prices? Journal of Financial Economics 105: 473–90. [Google Scholar] [CrossRef]
  21. James, William, and Charles Stein. 1961. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Oakland: University of California Press, vol. 1, pp. 361–79. [Google Scholar]
  22. Keynes, Josiah Charles. 1930. A Treatise on Money. London: Macmillan. [Google Scholar]
  23. Liu, Kejian. 1993. A new class of blased estimate in linear regression. Communications in Statistics-Theory and Methods 22: 393–402. [Google Scholar]
  24. Liu, Kejian. 2003. Using liu-type estimator to combat collinearity. Communications in Statistics-Theory and Methods 32: 1009–20. [Google Scholar] [CrossRef]
  25. Markov, Andreĭ Andreevich. 1912. Wahrscheinlichkeitsrechnung. Leipzig: B. G. Teubner. [Google Scholar]
  26. Miffre, Joëlle, and Georgios Rallis. 2007. Momentum strategies in commodity futures markets. Journal of Banking & Finance 31: 1863–86. [Google Scholar] [CrossRef]
  27. Sakkas, Athanasios, and Nikolaos Tessaromatis. 2020. Factor based commodity investing. Journal of Banking & Finance 115: 105782. [Google Scholar] [CrossRef]
  28. Seber, George A. F., and Alan J. Lee. 2003. Linear Regression Analysis, 2nd ed. Hoboken: John Wiley & Sons. [Google Scholar]
  29. Stein, Charles. 1956. Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. Oakland: University of California Press, vol. 1, pp. 197–206. [Google Scholar]
  30. Tasche, D. Risk contributions and performance measurement. Working Paper, Lehrstuhl für Mathematische Statistik, TU München, 1999. Available online: https://www.financerisks.com/filedati/WP/CAPITAL%20ALLOCATION/RISK%20PERFORMANCE%20MEASUREMENT.pdf (accessed on 5 March 2026).
  31. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58: 267–88. [Google Scholar] [CrossRef]
  32. Tikhonov, Andrey Nikolayevich. 1943. On the stability of inverse problems. Doklady Akademii Nauk SSSR 39: 195–98. [Google Scholar]
  33. Working, Holbrook. 1949. The theory of price of storage. The American Economic Review 39: 1254–62. [Google Scholar]
  34. Yang, Fan. 2013. Investment shocks and the commodity basis spread. Journal of Financial Economics 110: 164–84. [Google Scholar] [CrossRef]
Figure 1. Geometric interpretation of PR estimation in K 2 ( 1 , 1 ) for p = 1 , λ = 0 , and an appropriately chosen μ > 0 .
Figure 1. Geometric interpretation of PR estimation in K 2 ( 1 , 1 ) for p = 1 , λ = 0 , and an appropriately chosen μ > 0 .
Risks 14 00094 g001
Table 1. Simulation study results for different numbers of covariates.
Table 1. Simulation study results for different numbers of covariates.
n / m = 10 n / m = 25 n / m = 50 n / m = 100
ρ −0.75−0.500.50.75−0.75−0.500.50.75−0.75−0.500.50.75−0.75−0.500.50.75
Panel A: m = 2
OLS4.153.232.823.234.152.662.071.802.072.661.811.431.271.431.811.311.020.901.021.31
(2.84)(1.96)(1.53)(1.96)(2.84)(1.66)(1.17)(0.94)(1.17)(1.66)(1.13)(0.80)(0.67)(0.80)(1.13)(0.79)(0.56)(0.46)(0.56)(0.79)
RR2.031.781.701.912.231.471.341.361.551.771.141.091.171.351.520.870.850.931.131.34
(2.26)(1.49)(1.06)(1.41)(2.14)(1.28)(0.83)(0.58)(0.68)(1.06)(0.83)(0.57)(0.46)(0.44)(0.58)(0.57)(0.43)(0.39)(0.41)(0.41)
Liu2.021.791.731.952.291.381.241.251.451.721.081.011.061.221.410.910.830.851.001.18
(2.10)(1.45)(1.13)(1.39)(1.96)(1.28)(0.89)(0.69)(0.80)(1.14)(0.82)(0.59)(0.49)(0.52)(0.68)(0.55)(0.40)(0.36)(0.41)(0.48)
PR c 2.171.891.771.942.251.581.401.361.511.751.171.081.131.311.490.850.820.911.131.34
(2.21)(1.47)(1.11)(1.43)(2.11)(1.28)(0.86)(0.65)(0.79)(1.12)(0.84)(0.62)(0.53)(0.51)(0.66)(0.56)(0.45)(0.41)(0.41)(0.40)
PR t 2.301.931.771.922.241.751.461.351.471.721.361.131.091.201.401.050.830.770.921.13
(2.20)(1.48)(1.12)(1.46)(2.13)(1.30)(0.90)(0.67)(0.82)(1.17)(0.96)(0.71)(0.58)(0.61)(0.78)(0.86)(0.62)(0.50)(0.52)(0.61)
Panel B: m = 10
OLS5.834.063.244.055.823.602.511.992.513.602.561.781.421.782.561.791.250.991.251.79
(1.64)(1.07)(0.77)(1.06)(1.63)(1.01)(0.65)(0.46)(0.66)(1.01)(0.72)(0.47)(0.33)(0.47)(0.72)(0.50)(0.33)(0.23)(0.33)(0.50)
RR3.153.093.154.236.382.352.161.982.543.701.881.631.411.802.601.441.180.991.251.81
(1.03)(0.79)(0.75)(1.24)(2.05)(0.71)(0.54)(0.46)(0.69)(1.10)(0.55)(0.42)(0.33)(0.47)(0.75)(0.40)(0.30)(0.23)(0.33)(0.52)
Liu4.273.493.164.165.873.172.361.982.553.692.401.731.411.802.601.741.230.991.261.81
(1.28)(0.89)(0.76)(1.17)(1.69)(0.89)(0.61)(0.45)(0.70)(1.09)(0.67)(0.45)(0.33)(0.48)(0.76)(0.48)(0.32)(0.23)(0.34)(0.53)
PR c 3.443.153.124.206.312.412.091.952.553.721.831.541.381.812.621.351.110.991.251.80
(1.19)(0.88)(0.76)(1.24)(2.04)(0.81)(0.58)(0.46)(0.68)(1.08)(0.61)(0.42)(0.33)(0.48)(0.75)(0.40)(0.29)(0.23)(0.33)(0.52)
PR t 4.193.263.104.216.182.682.071.992.784.011.931.521.512.193.201.331.101.221.902.76
(1.59)(1.06)(0.80)(1.28)(1.95)(1.21)(0.75)(0.51)(0.69)(1.12)(0.95)(0.56)(0.34)(0.48)(0.72)(0.62)(0.29)(0.21)(0.31)(0.42)
Panel C: m = 25
OLS6.174.253.314.256.183.752.582.032.583.752.631.811.421.812.631.851.271.001.271.85
(1.12)(0.73)(0.49)(0.73)(1.12)(0.64)(0.42)(0.29)(0.42)(0.64)(0.45)(0.30)(0.20)(0.30)(0.45)(0.32)(0.21)(0.14)(0.21)(0.32)
RR4.663.993.314.266.183.222.522.032.593.752.411.791.421.812.631.751.270.991.271.85
(0.85)(0.66)(0.50)(0.73)(1.15)(0.55)(0.40)(0.29)(0.41)(0.65)(0.41)(0.29)(0.20)(0.30)(0.45)(0.30)(0.21)(0.14)(0.21)(0.32)
Liu5.974.193.314.266.193.702.582.032.603.752.631.811.421.812.661.851.271.001.271.85
(1.08)(0.71)(0.50)(0.74)(1.14)(0.63)(0.41)(0.29)(0.42)(0.64)(0.45)(0.29)(0.20)(0.30)(0.46)(0.32)(0.21)(0.14)(0.21)(0.32)
PR c 4.773.893.314.296.203.082.432.022.593.782.271.741.411.822.641.681.241.001.271.84
(1.06)(0.69)(0.50)(0.73)(1.14)(0.59)(0.40)(0.29)(0.42)(0.65)(0.41)(0.28)(0.20)(0.30)(0.46)(0.29)(0.20)(0.14)(0.21)(0.32)
PR t 5.163.963.434.767.163.222.432.233.435.452.301.751.732.964.891.661.311.442.704.60
(1.49)(0.85)(0.54)(0.78)(1.13)(0.94)(0.47)(0.32)(0.46)(0.67)(0.61)(0.29)(0.22)(0.36)(0.49)(0.31)(0.18)(0.16)(0.27)(0.34)
Notes. This table reports the average L 2 -distances, as defined in (30), with the corresponding standard deviation (in brackets) defined in (31), based on N = 1000 independent repetitions with sample size n = 1000 . We compare OLS, RR, Liu, and the two PR variants ( PR c , PR t ) across correlation parameters ρ { 0.75 , 0.5 , 0 , 0.5 , 0.75 } and ratios n / m { 10 , 25 , 50 , 100 } . Results are organised into Panel A ( m = 2 ), Panel B ( m = 10 ), and Panel C ( m = 25 ). Red values indicate the lowest average L2-distance, while underlined values denote the second-best performing estimator in each scenario.
Table 2. Summary of simulation study.
Table 2. Summary of simulation study.
Panel A ( m = 2 )Panel B ( m = 10 )Panel C ( m = 25 )Total (All Panels)
ModelBest2ndBest2ndBest2ndBest2nd
OLS009085175
RR453624915
Liu980401913
PR c 2239951416
PR t 5551151111
PR (Combined)7781010102527
Total Scenarios2020202020206060
Notes. This table summarises the performance of all estimators across 60 independent scenarios, based on the results reported in Table 1. “Best” and “2nd” denote the number of times an estimator achieved the lowest and second-lowest average L 2 -distance, respectively.
Table 3. Performance comparison for WTI monthly returns.
Table 3. Performance comparison for WTI monthly returns.
PredictedRMSE (%)MAE (%)
PeriodOLSRRLiu PR c PR t OLSRRLiu PR c PR t
279.8925.8532.6324.9926.4972.2221.3126.3322.0824.16
319.4819.7619.6719.5319.5716.0916.6216.4116.4216.47
437.4033.0334.7433.8634.9229.1426.3027.4926.7427.52
510.3114.8213.2610.9514.898.6312.3811.239.2312.45
638.9492.9691.2179.6590.4231.4782.7781.5769.8480.21
728.0412.0415.1314.2011.2723.969.5213.1011.868.81
830.7312.2411.8811.5712.1026.959.429.249.019.32
9138.289.2642.438.71131.4252.937.7522.096.9951.41
1031.8016.5815.4716.6220.2123.7713.3312.8413.3615.48
1167.5630.6533.6628.5129.0057.4520.9223.9520.4020.32
1251.4043.3543.6243.3543.3547.9842.1042.4342.1042.10
Best Count3214133122
2nd Best0415103260
Notes. The table presents the RMSE and MAE (in percentage points) defined in (33) for OLS, RR, Liu, PR c and PR t over 11 OOS predicted periods. For each period, the model with the best performance is highlighted in red, while the second-best is underlined. The bottom two rows summarise the total count of best and second-best performances achieved by each estimator across all periods.
Table 4. Performance comparison for Brent monthly returns.
Table 4. Performance comparison for Brent monthly returns.
PredictedRMSE (%)MAE (%)
PeriodOLSRRLiu PR c PR t OLSRRLiu PR c PR t
236.1430.1328.9130.4530.0331.9226.6925.4727.0126.60
313.7914.2413.9914.1714.2011.1611.9911.7211.8111.88
48.6210.638.8010.1111.026.898.696.908.098.68
583.72123.75122.3166.33138.2580.43111.96110.7763.22122.86
660.7522.9836.2726.1328.8429.0113.7418.6414.8015.67
711.6912.4811.6910.9211.058.9910.729.768.909.10
840.5223.5216.6622.6221.0233.7519.1813.6818.4417.18
921.1415.7215.8015.6216.2418.0913.8213.8713.7014.41
10163.0461.5172.9128.44127.71110.9440.7049.4025.6281.05
1168.7042.6948.4042.7142.7167.3241.4547.3241.4841.48
Best Count2224022240
2nd Best1222422223
Notes. The table presents the RMSE and MAE (in percentage points) defined in (33) for OLS, RR, Liu, PR c and PR t over 10 OOS predicted periods. For each period, the model with the best performance is highlighted in red, while the second-best is underlined. The bottom two rows summarise the total count of best and second-best performances achieved by each estimator across all periods.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Asimit, V.; Chen, Z.; Ichim, B.; Millossovich, P. Parity Regression Estimation. Risks 2026, 14, 94. https://doi.org/10.3390/risks14040094

AMA Style

Asimit V, Chen Z, Ichim B, Millossovich P. Parity Regression Estimation. Risks. 2026; 14(4):94. https://doi.org/10.3390/risks14040094

Chicago/Turabian Style

Asimit, Vali, Ziwei Chen, Bogdan Ichim, and Pietro Millossovich. 2026. "Parity Regression Estimation" Risks 14, no. 4: 94. https://doi.org/10.3390/risks14040094

APA Style

Asimit, V., Chen, Z., Ichim, B., & Millossovich, P. (2026). Parity Regression Estimation. Risks, 14(4), 94. https://doi.org/10.3390/risks14040094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop