Parity Regression Estimation

Asimit, Vali; Chen, Ziwei; Ichim, Bogdan; Millossovich, Pietro

doi:10.3390/risks14040094

Open AccessFeature PaperArticle

Parity Regression Estimation^†

¹

Bayes Business School, City St George’s, University of London, 106 Bunhill Row, London EC1Y 8TZ, UK

²

Faculty of Mathematics and Computer Science, University of Bucharest, Str. Academiei 14, 010014 Bucharest, Romania

³

Research Unit 5, Simion Stoilow Institute of Mathematics of the Romanian Academy, C.P. 1-764, 010702 Bucharest, Romania

⁴

Department of Economics, Business, Mathematics and Statistics (DEAMS), Università Degli Studi di Trieste, Via Università 1, 34127 Trieste, Italy

^*

Author to whom correspondence should be addressed.

^†

An R package implementing our estimators is available on CRAN at https://cran.r-project.org/web/packages/savvyPR/index.html (accessed on 17 March 2026).

Risks 2026, 14(4), 94; https://doi.org/10.3390/risks14040094

Submission received: 5 March 2026 / Revised: 9 April 2026 / Accepted: 13 April 2026 / Published: 21 April 2026

Download

Browse Figures

Versions Notes

Abstract

Multiple linear regression remains a foundational predictive methodology across a broad range of applications. We propose a novel regression framework that, rather than minimising the aggregate prediction error associated with the dependent variable, explicitly distributes the risk evenly across all model parameters. This approach provides a structural safeguard that is particularly suitable for data affected by substantial noise, as is often the case in time series environments characterised by regime shifts, structural breaks, and evolving trends. We provide a theoretical characterisation of our proposed estimator, named Parity Regression, and benchmark its analytical properties against existing penalised and shrinkage estimators in the literature. Both synthetic experiments and empirical applications demonstrate that the theoretical guarantees of the proposed method translate into enhanced out-of-sample forecasting stability in practice.

Keywords:

ordinary least square; parity; ridge regression; shrinkage estimation

JEL Classification:

C13; C53; C58; G17

1. Introduction

Multiple linear regression is one of the most widely used predictive and inferential models across a broad range of scientific disciplines, including economics, engineering, medicine, and the social sciences. The model relates a scalar response random variable Y to a set of explanatory random variables

X_{1}, X_{2}, \dots, X_{p}

through the linear model, where p denotes the number of covariates

Y = θ_{0} + θ_{1} X_{1} + \dots + θ_{p} X_{p} + ε,

where

θ = {(θ_{0}, θ_{1}, \dots, θ_{p})}^{⊤}

is the unknown parameter vector and

ε

is a random error term with

E [ε] = 0

. For a sample of size n drawn from

(Y, X)

, let

y = {(y_{1}, y_{2}, \dots, y_{n})}^{⊤}

denote the observed vector of responses and

\tilde{X}

be the

n \times (p + 1)

design matrix with rows

{\tilde{x}}_{i}^{⊤}

, where

{\tilde{x}}_{i} = {(1, x_{i}^{⊤})}^{⊤}

for all

1 \leq i \leq n

.

A common objective in regression analysis is to estimate

θ

via estimators with a low mean squared error (MSE); the MSE of an estimator is explicitly defined in (3). This is typically achieved by minimising a loss functional

L

measuring the discrepancy between the dependent variable and its linear predictor. The standard choice is the

l_{2}

loss, which leads to the well-known Ordinary Least Squares (OLS) estimator, denoted by

{\hat{θ}}^{OLS}

, which is obtained by minimising the residual sum of squares (RSS):

{\hat{θ}}^{OLS} : = \underset{θ \in R^{p + 1}}{argmin} RSS (θ), where RSS (θ) : = \frac{1}{n} \sum_{i = 1}^{n} {(θ^{⊤} {\tilde{x}}_{i} - y_{i})}^{2} .

(1)

Here,

L = RSS

. The OLS estimator has a closed-form solution as follows:

{\hat{θ}}^{OLS} = {({\tilde{X}}^{⊤} \tilde{X})}^{- 1} {\tilde{X}}^{⊤} y,

(2)

provided that

{\tilde{X}}^{⊤} \tilde{X}

is invertible. Note that

{\tilde{X}}^{⊤} \tilde{X}

is a symmetric matrix that is always positive semi-definite and not necessarily positive definite, which would guarantee the existence of

{({\tilde{X}}^{⊤} \tilde{X})}^{- 1}

. The OLS estimator is often called the Best Linear Unbiased Estimator (BLUE) according to the Gauss–Markov Theorem (Gauss 1821; Markov 1912), which guarantees it has the lowest variance among all unbiased linear estimators. Furthermore, if the error term is normally distributed, OLS coincides with the maximum likelihood estimator, allowing for exact finite-sample inference (Seber and Lee 2003).

A limitation of (2) is that its estimation error is driven by the estimation error of

{\tilde{X}}^{⊤} \tilde{X}

, which can be highly problematic since the empirical eigenvalues of a matrix are often poor estimators of the population eigenvalues; for a detailed discussion, see Asimit et al. (forthcoming-b), with a summary provided in the Literature Review. When the sample is affected by substantial noise, as is common in time series data with structural changes or evolving trends, the out-of-sample (OOS) performance of OLS deteriorates further. This motivates the need for a more robust linear regression estimator suitable for such settings.

In this paper, we introduce our novel regression method, Parity Regression (PR), and outline three primary contributions. First, we propose the PR estimator, which, rather than minimising the global empirical risk, ensures that prediction errors are fairly distributed across all model parameters through a rigorous theoretical characterisation. Second, we empirically show that our estimator outperforms OLS, as well as existing penalised and shrinkage estimators, both on synthetic simulations and real-world datasets. Third, to facilitate the reproduction of results and practical application, we have made the proposed estimator publicly available via the R package savvyPR on CRAN.1

Literature Review

Our review of the literature begins with Stein’s paradox (James and Stein 1961; Stein 1956), which marked a fundamental shift in statistical thinking by demonstrating that shrinkage can systematically improve estimation accuracy under the MSE criterion. By showing that deliberate introduction of bias may reduce the overall estimation error through a variance–bias trade-off, it provided a conceptual foundation for modern regularisation techniques. Although shrinkage in its classical form emerged after Tikhonov regularisation (Tikhonov et al. 1943), which laid the groundwork for penalised regression, the underlying principle is closely related. The common thread linking penalised regression and shrinkage, despite their origins in different applications, is that a controlled introduction of bias can substantially reduce the estimator variability, thereby yielding an estimator that outperforms the natural unbiased estimator.

We begin by setting aside the estimation of the regression parameter vector

θ

and instead consider the problem of estimating the population mean vector

μ

, thereby clarifying the foundations of the shrinkage principle arising from Stein’s paradox. This paradox, introduced in the seminal papers by (James and Stein 1961; Stein 1956), fundamentally challenges the established statistical paradigm. Its core premise is that, although unbiased estimators possess robust theoretical properties, they may still be strictly suboptimal when efficiency is assessed using the MSE criterion. Recall that the MSE of a generic estimator

\hat{θ}

of

θ

is defined as

MSE (\hat{θ}) : = Var (\hat{θ}) + {(Bias (\hat{θ}))}^{2} .

(3)

The estimator proposed in (James and Stein 1961; Stein 1956), commonly referred to as the James–Stein estimator, demonstrates that the sample mean vector

\bar{X} \in R^{p}

is a sub-optimal estimator of the population mean vector

μ

. Assuming a multivariate normal sampling distribution, an estimator that strictly dominates the sample mean in terms of MSE can be constructed via multiplicative shrinkage,

\hat{μ} = c \bar{X}

, where c represents the theoretically optimal shrinkage intensity. This estimator is often termed the oracle shrinkage estimator, as it still depends on unknown population parameters and is therefore not fully data-driven. In practice, substituting these unknown population parameters with sample estimates gives a fully data-driven counterpart, often referred to as a bona fide shrinkage estimator. For example, the James–Stein estimator derived in James and Stein (1961) is given by

\begin{matrix} {\hat{μ}}_{PJS} : = {(1 - \frac{(p - 2) {\hat{σ}}^{2}}{n ∥ \bar{X} ∥_{2}^{2}})}_{+} \bar{X}, for all p \geq 3 and n \geq 2, \end{matrix}

where

t_{+} : = max (t, 0)

and

{\hat{σ}}^{2} : = \frac{1}{p} Tr (S)

, with

S

denoting the sample covariance matrix estimator; note that

{∥ \cdot ∥}_{p}

denotes the usual p-norm. For a comprehensive treatment of mean vector shrinkage estimation, the reader is referred to (Asimit et al., forthcoming-c; Bodnar et al. 2022).

Beyond yielding mean vector estimators with strictly reduced estimation error, Stein’s paradox establishes a generalised shrinkage principle that extends well beyond the confines of high-dimensional mean estimation. In particular, this principle can be applied to the estimation of the regression parameter vector

θ

. We therefore provide a succinct review of shrinkage estimators, which deliberately introduce bias in order to reduce the overall MSE through a variance-bias trade-off, a concept that lies at the core of Cross Validation (CV) in statistics and machine learning.

An alternative to OLS is Ridge Regression (RR), introduced by Hoerl and Kennard (1970), which is designed to mitigate overfitting by shrinking the regression parameters and is particularly useful in the presence of multicollinearity or ill-conditioning (i.e., when

{\tilde{X}}^{⊤} \tilde{X}

possesses zero or near-zero eigenvalues). RR minimises the

L_{2}

-penalised

RSS

as follows:

{\hat{θ}}^{R R} (λ) : = \underset{θ \in R^{p + 1}}{argmin} RSS (θ) + λ {∥ θ ∥}_{2}^{2}, λ \geq 0,

(4)

where

λ

is a tuning parameter controlling the strength of the penalty. The solution admits the closed-form expression

{\hat{θ}}^{R R} (λ) = {({\tilde{X}}^{⊤} \tilde{X} + λ I_{p + 1})}^{- 1} {\tilde{X}}^{⊤} y,

(5)

where

λ > 0

guarantees that

{({\tilde{X}}^{⊤} \tilde{X} + λ I_{p + 1})}^{- 1}

exists. The penalisation term reduces estimation error, particularly when some eigenvalues of

{\tilde{X}}^{⊤} \tilde{X}

are zero or close to zero.

By standard duality arguments, (4) is equivalent to the constrained formulation

\begin{matrix} min_{θ \in R^{p + 1}} RSS (θ) subject to \sum_{k = 0}^{p} θ_{k}^{2} \leq \tilde{λ}, \tilde{λ} \geq 0, \end{matrix}

(6)

where

\tilde{λ}

controls the size of the constraint set.

RR is a particular case of Tikhonov regularisation (Tikhonov et al. 1943), a broader framework for addressing ill-posed estimation problems. Specifically, for a penalty function

g : R^{p + 1} \to R_{+}

, the Tikhonov estimator is defined as

\hat{θ} : = \underset{θ \in R^{p + 1}}{argmin} \{\frac{1}{2} {∥ y - X θ ∥}_{2}^{2} + g (θ)\} .

(7)

When

g (θ) = {λ ∥ θ ∥}_{2}^{2}

, the Tikhonov estimator reduces to the RR estimator. Since

{\hat{θ}}^{R R} (0) = {\hat{θ}}^{O L S} and {\hat{θ}}^{R R} (λ) \to 0 as λ \to \infty,

RR is a shrinkage estimator that increasingly biases the estimates towards the origin as

λ

grows. Hoerl and Kennard (1970) showed that there exists an oracle estimator

λ^{★} > 0

such that

MSE ({\hat{θ}}^{R R} (λ^{★})) < MSE ({\hat{θ}}^{O L S}),

demonstrating that RR can outperform OLS when

λ

is suitably chosen. CV provides a practical method for selecting a bona fide estimate of

λ^{★}

; however, the in-sample optimal choice may not be too close to the OOS optimal choice, which may increase the estimation error of the linear model.

When

g (θ) = {t ∥ θ ∥}_{1}

with

t \geq 0

, the Tikhonov estimator reduces to the Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996) and Basis Pursuit Denoising (Chen and Donoho 1994). The

L_{1}

-norm penalty induces sparsity by driving certain coefficients exactly to zero, thereby performing explicit variable selection and regularisation. Owing to the non-differentiability of the

L_{1}

penalty, LASSO does not admit a closed-form solution; however, it is equivalent to the constrained optimisation problem

\begin{matrix} min_{θ \in R^{p + 1}} RSS (θ) subject to \sum_{k = 0}^{p} | θ_{k} | \leq \tilde{t}, \tilde{t} \geq 0 . \end{matrix}

(8)

Although both LASSO and RR regularise the model, they differ fundamentally in their mechanisms. LASSO induces sparsity by setting certain parameters exactly to zero, thereby selecting a subset of predictors and enhancing interpretability. In contrast, RR shrinks the parameters continuously towards zero without eliminating any of them entirely.

RR and LASSO are examples of penalised regression methods that can be interpreted as shrinkage estimators, although they are not constructed explicitly as shrinkage procedures. In contrast, there exists a broad class of estimators that directly shrink the OLS estimator towards a specified target. Such shrinkage estimators are typically simple, admitting closed-form expressions that are designed to optimise the theoretical MSE. Ideally, the corresponding oracle optimal shrinkage estimator is available in closed form, with its plug-in counterpart serving as a bona fide estimator, although CV may alternatively be employed. While both approaches exhibit distinct computational and theoretical trade-offs during implementation, this practical distinction remains largely underexplored in the existing literature.

The Liu estimator (Liu) (Liu 1993) is a shrinkage estimator that directly shrinks the OLS estimator towards the target

{({\tilde{X}}^{⊤} \tilde{X} + I_{p + 1})}^{- 1} {\tilde{X}}^{⊤} y

. Specifically, it modifies the OLS estimator as follows:

{\hat{θ}}^{Liu} (d) = {({\tilde{X}}^{⊤} \tilde{X} + I_{p + 1})}^{- 1} ({\tilde{X}}^{⊤} y + d {\hat{θ}}^{OLS}),

(9)

where

d \in (0, 1)

is a shrinkage parameter. Under certain conditions, Liu (1993) showed that there exists an optimal

d^{★} \in (0, 1)

such that

MSE ({\hat{θ}}^{Liu} (d^{★})) < MSE ({\hat{θ}}^{OLS}) .

Hence, the oracle optimal shrinkage estimator

d^{★}

admits a closed-form expression. Nevertheless, in practice, standard software implementations typically select d via CV, similarly to the RR estimator. Finally, note that

{\hat{θ}}^{Liu} (1) = {\hat{θ}}^{OLS}

.

Liu (2003) extended this framework by proposing a two-parameter Liu estimator to address multicollinearity more effectively. The objective is for this estimator to inherit the stabilising properties of RR, which is specifically designed to accommodate ill-conditioned design matrices where

{\tilde{X}}^{⊤} \tilde{X}

exhibits zero or near-zero eigenvalues, which is a hallmark of multicollinearity. The two-parameter Liu estimator introduces an additional parameter k to provide finer control over the shrinkage effect, while retaining the adjustment governed by d. It is defined as

{\hat{θ}}^{Liu - type} (k, d) = {({\tilde{X}}^{⊤} \tilde{X} + k I_{p + 1})}^{- 1} ({\tilde{X}}^{⊤} y - d {\hat{θ}}^{R R}) .

Liu (2003) showed that for any

k > 0

, there exists an optimal

d^{★ ★}

such that

MSE ({\hat{θ}}^{Liu - type} (k, d^{★ ★})) \leq MSE ({\hat{θ}}^{R R}) .

Although this estimator offers greater flexibility, both parameters must typically be selected via CV in practical implementations. Joint optimisation of

(k, d)

entails a considerable computational burden, and the additional estimation variability may increase the overall MSE, and thus, despite its appealing theoretical guarantees, the two-parameter Liu estimator is less practical for empirical applications, and we therefore exclude it from our current implementation.

The remainder of the paper is organised as follows. Section 2 presents the main theoretical results. Section 3 reports an informative simulation study, while Section 4 provides a comprehensive real-data analysis. Concluding remarks are given in Section 5. All proofs and supporting technical details are collected in three appendices. Appendix A contains the proofs of all theoretical results. Appendix B provides additional details on the data-generating process underlying the simulation study. Appendix C includes further information on the datasets used in Section 4.

2. Main Results

This section presents all the theoretical results of the paper. We begin with Section 2.1, which provides a general overview of parity estimation, including related theoretical results underpinning the concept. This foundational theory clarifies how parity estimation is particularly applied to multiple linear regression; for further details, see Section 2.2. The section concludes with Section 2.3, where we aim to enhance the explainability of the proposed concepts developed in Section 2.1 and Section 2.2.

Before presenting the main results, we introduce some notation. The symbol ⪰ indicates that one symmetric matrix is greater than or equal to another in the Loewner ordering, meaning that their difference is positive semidefinite, whereas ≻ denotes strict dominance, meaning that their difference is positive definite. In particular,

Σ ≻ 0

(

Σ ⪰ 0

) indicates that

Σ

is positive definite (positive semidefinite). Additionally,

diag (A)

denotes the diagonal matrix formed from the diagonal elements of the matrix

A

.

2.1. Parity Estimation

In statistics and machine learning, predictive models aim to relate a dependent target variable Y to a covariate (feature) vector

X = (X_{1}, X_{2}, \dots, X_{p})

. Let

l : R \times R^{p} \times Θ \to R_{+}

denote a loss function, where

Θ \subset R^{q}

is the feasible set of the q-dimensional parameter vector

θ

, and

q > p

. In a multiple linear regression model comprising p covariates and an intercept term, we have

q = p + 1

. As discussed in Section 2.2, when a synthetic term is introduced, this dimension q is further extended to

q = p + 2

. The model parameters

θ

are estimated by minimising the expected loss,

L (θ) : = E [l (Y, X; θ)]

. It is often assumed that

L : Θ \to (0, \infty]

is differentiable, which we assume throughout the paper. The estimation problem therefore reduces to finding

\hat{θ} \in \underset{θ \in Θ}{argmin} L (θ) .

(10)

We now introduce the notion of parity estimation, which, to the best of our knowledge, has not previously been studied. A parity estimator is a vector

θ \in Θ

such that

θ_{i} \neq 0

for all

i = 1, 2, \dots, q

, and

\frac{\partial L (θ)}{\partial θ_{k}} / \frac{L (θ)}{θ_{k}} = \frac{\partial L (θ)}{\partial θ_{l}} / \frac{L (θ)}{θ_{l}} for all 1 \leq k < l \leq q .

(11)

Condition (11) states that the elasticity of the loss function

L

is the same across all components of the parameter vector

θ

. This is inspired by the theoretical foundation of capital allocation in linear risk portfolios; for a brief overview of this concept, we refer the reader to (Asimit et al. 2011, 2013, 2019; Tasche 1999). We are now ready to introduce the additional assumptions required for our main results, stated as Assumptions 1 and 2.

Assumption 1.

Assume that the feasible set

Θ \subset R^{q}

is the parametric cone

K_{q} (δ) : = \{θ \in R^{q} : δ_{i} θ_{i} > 0, i = 1, \dots, q\},

where

δ \in {- 1, 1}^{q}

.

Assumption 2.

The loss function

L

satisfies the following growth condition: there exist constants

M_{1}, M_{2} > 0

such that

L (θ) \geq M_{1} {∥ θ ∥}_{\infty} for all θ \in Θ with {∥ θ ∥}_{\infty} > M_{2}, where {∥ θ ∥}_{\infty} = {max}_{i} | θ_{i} | .

We are now ready to present our first main result, stated as Theorem 1, which provides a characterisation of parity estimation.

Theorem 1.

Under Assumption 1, the following results hold.

(i): For any $μ > 0$ , any solution of (12)

$min_{θ \in K_{q} (δ)} (L (θ) - μ \sum_{k = 1}^{q} log (δ_{k} θ_{k})),$

(12)

is a parity estimator.
(ii): If $L$ is convex and Assumption 2 holds, then (12) admits a unique solution for any $μ > 0$ .
(iii): Assume that $L (θ)$ is convex and homogeneous of order $τ \geq 1$ . Then, for any $μ > 0$ , (12) admits a unique solution $θ^{★} (μ)$ , which satisfies

$\begin{matrix} L (θ^{★} (μ)) = \frac{q μ}{τ} and θ^{★} (μ) = μ^{1 / τ} θ^{★} (1) . \end{matrix}$

Further, $\tilde{θ} \in K_{q} (δ)$ is a parity estimator in (11) if and only if there exists $μ_{0} > 0$ such that $\tilde{θ} = μ_{0}^{1 / τ} θ^{★} (1)$ .

Condition (11) may be relaxed to define a partial parity estimator. Specifically, a vector

θ \in Θ

is called a partial parity estimator if

θ_{i} \neq 0

for all

i = 1, 2, \dots, q_{0}

, and

\frac{\partial L (θ)}{\partial θ_{k_{1}}} / \frac{L (θ)}{θ_{k_{1}}} = \frac{\partial L (θ)}{\partial θ_{k_{2}}} / \frac{L (θ)}{θ_{k_{2}}} for all 1 \leq k_{1} < k_{2} \leq q_{0} .

(13)

Condition (13) states that the elasticity of the loss function

L

is identical across a specified subset of components of the parameter vector

θ

.

We are now ready to present our second main result, stated as Proposition 1, which provides a characterisation of partial parity estimation.

Proposition 1.

Under Assumption 1 and let

t = (t_{q_{0} + 1}, t_{q_{0} + 2}, \dots, t_{q}) \geq 0_{q - q_{0}}

, where

0_{q - q_{0}}

is a zero vector of dimension

q - q_{0}

, the following results hold.

(i): For any $μ > 0$ and $t \geq 0_{q - q_{0}}$ , any solution of (14)

$min_{θ \in K_{q} (δ)} (L (θ) - μ \sum_{k = 1}^{q_{0}} log (δ_{k} θ_{k}) - μ \sum_{k = q_{0} + 1}^{q} t_{k} log (δ_{k} θ_{k})),$

(14)

is the parity estimator in (13).
(ii): Assume that $L$ is convex and Assumption 2 holds. Then, (14) admits a unique solution for any $μ > 0$ and $t \geq 0_{q - q_{0}}$ .
(iii): Assume that $L (θ)$ is convex and homogeneous of order $τ \geq 1$ in θ, then for any $μ > 0$ and $t \geq 0_{q - q_{0}}$ , (14) admits a unique solution $θ^{★} (μ, t)$ , which satisfies

$\begin{matrix} L (θ^{★} (μ, t)) = \frac{(q_{0} + 1^{⊤} t) μ}{τ} and θ^{★} (μ, t) = μ^{1 / τ} θ^{★} (1, t) \end{matrix}$

The proof of Proposition 1 is omitted, as it follows the same reasoning used in the proof of Theorem 1. Note that the final statement in Proposition 1 (iii) does not hold on an “if and only if” basis, unlike its counterpart in Theorem 1 (iii). Specifically, we cannot assert that any interior point

\tilde{θ} \in K_{q} (δ)

satisfying (13) necessarily implies the existence of

μ_{0} > 0

and

t_{0} \geq 0_{q - q_{0}}

such that

\tilde{θ} = μ_{0}^{1 / τ} θ^{★} (1, t_{0})

. This means that the parametric optimal solutions in (14), with parameters

(μ, t) \in R_{+ +} \times R_{+}^{q - q_{0}}

, may yield a large set of vectors satisfying (13), but not necessarily all of them. This contrasts with Theorem 1, where there is a one-to-one correspondence between the parametric set of optimal solutions in (12), parameterised by

μ \in R_{+ +}

, and the set of vectors satisfying (11). Consequently, it is practical to search for optimal solutions in (14) over

(μ, 0_{q - q_{0}})

with

μ > 0

. That is, a partial parity estimator can be obtained as the parametric set of solutions in

μ

, given by

min_{θ \in K_{q_{0}} (δ)} L (θ) - μ \sum_{k = 1}^{q_{0}} log (δ_{k} θ_{k}), with μ > 0,

(15)

since this formulation effectively minimises the loss with respect to the remaining components

(θ_{q_{0} + 1}, \dots, θ_{q})

.

Section 2.2 provides a broad overview of how parity estimation operates in the context of multiple linear regression, while also presenting some specific theoretical results for linear regression models.

2.2. Parity Estimation for Linear Regression

We begin by noting that the Parity Regression (PR), the parity estimation framework for linear regression, shares conceptual similarities with well-known penalised regression methods, as it introduces specific constraints. Figure 1 provides a simple geometric interpretation of PR estimation, reminiscent of the classic geometric representations of RR; for example, see Figure 3.11 in Hastie et al. (2009) or Figure 2 in Tibshirani (1996).

The illustration in Figure 1 provides an intuitive explanation for the presence of a single solution in each quadrant when

p = 1

, and the same reasoning extends naturally to cases with

p > 1

. Theorem 1 and Proposition 1 provide the theoretical justification for this geometric interpretation.

The PR method builds upon the structure of RR by introducing elasticity constraints that ensure the resulting loss function is homogeneous and convex in a specific form. This can be formalised by defining the total expected loss as

\begin{matrix} RRSS (\hat{θ}; λ) & : = & \frac{1}{n} \sum_{i = 1}^{n} {(θ^{⊤} {\tilde{x}}_{i} - θ_{p + 1} y_{i})}^{2} + λ \sum_{k = 1}^{p} θ_{k}^{2}, \\ = & \frac{1}{n} {\hat{θ}}^{⊤} Z^{⊤} Z \hat{θ} + λ {\hat{θ}}^{⊤} diag (1_{p + 1}, 0) θ, \end{matrix}

(16)

where

\hat{θ} = (θ, θ_{p + 1})

extends the parameter vector by including the synthetic term

θ_{p + 1} = 1

for the dependent variable,

1_{p + 1}

is a vector of ones of length

p + 1

, and

Z

is an

n \times (p + 2)

matrix with the ith row given by

({\tilde{x}}_{i}^{⊤}, - y_{i})

for all

1 \leq i \leq n

. To satisfy the homogeneity condition required by the parity estimator, as stated in Proposition 1, we introduce the synthetic term

θ_{p + 1} = 1

. This ensures that (16) becomes a homogeneous function of order

τ = 2

, while maintaining convexity and homogeneity for the extended parameter vector

\hat{θ} \in K_{p + 2} (δ)

. These properties fulfil the necessary conditions for applying parity estimation, thereby allowing us to directly extend parity estimation principles to the PR framework. To ensure that the loss function

RRSS

behaves as intended, we impose the following conditions, stated formally in Assumption 3.

Assumption 3.

Let Assumption 1 hold. If

λ = 0

, then

\begin{matrix} RRSS ((θ, 1); 0) > 0 for all (θ, 1) \in K_{p + 2} (δ) . \end{matrix}

(17)

Note that

Z^{⊤} Z ⪰ 0

, and therefore, for any given

λ > 0

and search cone

K_{p + 2} (δ)

, we have

\begin{matrix} RRSS ((θ, 1); λ) > 0 for all (θ, 1) \in K_{p + 2} (δ) . \end{matrix}

(18)

This condition may fail only if there exists

\tilde{θ} \in R^{p + 1}

such that

\begin{matrix} y_{i} - {\tilde{θ}}_{0} - \sum_{k = 1}^{p} {\tilde{θ}}_{k} x_{i k} = 0 for all 1 \leq i \leq n, \end{matrix}

(19)

with reference to (1). Specifically, (19) can arise in the following scenarios: (i) an imbalanced regression with

n < p + 1

, (ii) strong linear dependence among some features/predictors with

n > p + 1

, or (iii) a special data structure when

n = p + 1

such that

{\hat{θ}}^{O L S} = \tilde{θ}

. In these cases, Assumption 3 implies

RRSS ((θ, 1); 0) = 0 if and only if θ \in R^{p + 1} ∖ {\tilde{θ}} .

Our main PR results for this section, namely Theorem 2 and Proposition 2, are now presented. These results establish the existence and uniqueness of the PR estimators under the specified elasticity-based constraints.

Theorem 2.

Let

λ \geq 0

and

t \geq 0

for which Assumptions 1 and 3 hold.

(i): For any $\tilde{μ} > 0$ , the unconstrained optimisation problem

$\begin{matrix} min_{(θ, θ_{p + 1}) \in K_{p + 2} (δ)} (RRSS (θ, θ_{p + 1}; λ) - μ \sum_{k = 0}^{p} log (δ_{k} θ_{k}) - μ t log (δ_{p + 1} θ_{p + 1})) \end{matrix}$

(20)

admits a unique solution, denoted by ${({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤}$ , which satisfies the parity conditions in (11). This solution yields the PR estimate

$\begin{matrix} {\hat{θ}}^{P R} (λ, t) = \frac{{({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤}}{θ_{p + 1}^{★} (λ, t, μ)} = {({({\hat{β}}^{P R} (λ, t))}^{⊤}, 1)}^{⊤}, \end{matrix}$

(21)

which is independent of $μ > 0$ for any fixed $(λ, t)$ . Furthermore, we have

$RRSS ((θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ)); λ) = \frac{(p + 1 + t) μ}{2},$

(22)

for any $(λ, t, μ) \in R_{+} \times [0, \infty) \times R_{+}^{★}$ , and

${\hat{β}}^{P R} (λ, t) = θ^{★} (λ, t, μ^{★}) = {(μ^{★})}^{1 / 2} θ^{★} (λ, t, 1),$

(23)

where $μ^{★} = {(θ_{p + 1}^{★} (λ, t, 1))}^{- 2}$ . Furthermore, define ${\tilde{θ}}^{P R} (λ, t) : = {({({\tilde{β}}^{P R} (λ, t))}^{⊤}, 1)}^{⊤}$ as a PR estimate in the search cone $K_{p + 2} (δ)$ , which satisfies (13). Then, ${\tilde{θ}}^{P R} (λ, t)$ is the unique solution of (20) with $\tilde{μ} = (\frac{2}{p + 1 + t}) RRSS ({\tilde{β}}^{P R} (λ, t), 1; λ)$ . Consequently, we have ${\hat{β}}^{P R} (λ, t) = {\tilde{β}}^{P R} (λ, t)$ .
(ii): For any $\tilde{μ} \in R$ , the constrained optimisation problem

$\begin{matrix} \{\begin{matrix} min_{(θ, θ_{p + 1}) \in K_{p + 2} (δ)} & RRSS (θ, θ_{p + 1}; λ) \\ s . t . & \sum_{k = 0}^{p} log (δ_{k} θ_{k}) + t log (δ_{p + 1} θ_{p + 1}) \geq \tilde{μ} \end{matrix} \end{matrix}$

(24)

admits a unique solution, denoted by ${({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤}$ . This solution yields the PR estimate

$\begin{matrix} {\hat{\hat{θ}}}^{P R} (λ, t) = \frac{{({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤}}{θ_{p + 1}^{★ ★} (λ, t, \tilde{μ})} = {({({\hat{\hat{β}}}^{P R} (λ, t))}^{⊤}, 1)}^{⊤}, \end{matrix}$

(25)

which is independent of $\tilde{μ}$ for any given $(λ, t)$ . Additionally, we have

${\hat{\hat{β}}}^{P R} (λ, t) = θ^{★ ★} (λ, t, {\tilde{μ}}^{★}) = e^{\frac{{\tilde{μ}}^{★}}{p + 1 + t}} θ^{★ ★} (λ, t, 0),$

(26)

where ${\tilde{μ}}^{★} = - (p + 1 + t) log θ_{p + 1}^{★ ★} (λ, t, 0)$ , and strong duality holds in (24).
(iii): For any given values of $μ > 0$ and $\tilde{μ} \in R$ , we have that ${\hat{β}}^{P R} (λ, t) = {\hat{\hat{β}}}^{P R} (λ, t)$ .

Theorem 2 (i) shows that up to

2^{p + 1}

distinct PR estimates can be identified, each residing in one of the

2^{p + 1}

possible search cones. This is consistent with Assumption 1, where estimators are sought within specific quadrants of the parameter space. For a given p, the PR estimates can therefore span all possible quadrants, and (20) provides a systematic procedure to obtain each potential PR solution in this setting.

Theorem 2 (ii) represents a constrained form of (i) and constitutes a special case of the framework discussed in Section 2.3. Theorem 2 (iii) establishes the uniqueness of each PR estimate and its independence from the normalising constants

μ

and

\tilde{μ}

, thereby simplifying computation by eliminating the need for cross-validation over these constants. As a result, any suitable computational method, such as (20), produces a stable, single PR estimate within the chosen cone

K_{p + 2} (δ)

across all parameter quadrants when

p > 1

.

We are now ready to present the main result of this section, stated as Proposition 2.

Proposition 2.

Assume that

{\hat{β}}^{R R} (λ)

and

{\hat{β}}^{O L S}

contain only non-zero elements. Let

δ^{R R} : = s g n ({\hat{β}}^{R R} (λ)) and δ^{O L S} : = s g n ({\hat{β}}^{O L S}),

where the signum function is applied componentwise such that

s g n (a) = 1

and

s g n (a) = - 1

whenever

a > 0

and

a < 0

, respectively. Thus,

{\hat{β}}^{R R} (λ) \in K_{p + 1} (δ^{R R})

and

{\hat{β}}^{O L S} \in K_{p + 1} (δ^{O L S})

.

(i): Let $λ > 0$ and $t \geq 0$ for which Assumption 1 holds for $K_{p + 1} (δ^{R R})$ . The PR estimate ${\hat{β}}^{P R} (λ, c) \in K_{p + 1} (δ^{R R})$ satisfies

$\prod_{k = 0}^{p} \frac{{\hat{β}}_{k}^{P R} (λ, t)}{{\hat{β}}_{k}^{R R} (λ)} \geq 1 .$

(27)
(ii): Let $λ = 0$ and $t \geq 0$ for which Assumption 1 holds for $K_{p + 1} (δ^{O L S})$ . The PR estimate ${\hat{β}}^{P R} (0, t) \in K_{p + 1} (δ^{O L S})$ satisfies

$\prod_{k = 0}^{p} \frac{{\hat{β}}_{k}^{P R} (0, t)}{{\hat{β}}_{k}^{O L S}} \geq 1 .$

(28)

Proposition 2 recommends using either the OLS or RR estimate as a starting point for selecting the search cone in PR estimation. Choosing the cone defined by OLS,

{\tilde{θ}}^{O L S} = ({\hat{β}}^{O L S}, 1) = ({\hat{β}}^{R R} (0), 1)

, provides a non-regularised baseline. Alternatively, selecting the cone containing the RR estimate,

{\tilde{θ}}^{R R} = ({\hat{β}}^{R R} (λ^{★}), 1)

, offers a regularised approach, where

λ^{★}

is chosen via CV, for example, using the glmnet package in R.

2.3. Parity Estimation and Regression—Further Explainability

We have established the main theory of parity estimation and PR in the previous two sections. We now aim to enhance the explainability of these concepts by providing a higher-level description of the theory and by linking PR to well-known penalised regression methods, such as RR and LASSO. To this end, we introduce the concept of the Generalised Weighted Mean (GWM).

The GWM generalises several well-known averaging operations and depends on a parameter r. For a given

x \in R^{p + 1}

and a vector of weights

b = {(b_{0}, b_{1}, \dots, b_{p})}^{⊤}

, where

b_{i} \geq 0

and

1^{⊤} b = 1

, the GWM of order r is defined as

m_{r} (x; b) = {(\sum_{k = 0}^{p} b_{k} {| x_{k} |}^{r})}^{\frac{1}{r}}, for r \in R \cup {\pm \infty} .

It holds that

m_{r} (x; b) \leq m_{s} (x; b)

for all

- \infty \leq r < s \leq \infty

. The limiting case

r = 0

, corresponding to the weighted geometric mean, is given by

m_{0} (x; b) : = lim_{r \to 0} m_{r} (x; b) = \prod_{k = 0}^{p} {| x_{k} |}^{b_{k}} = exp (\sum_{k = 0}^{p} b_{k} log | x_{k} |) .

For

r = \pm \infty

, we obtain

m_{- \infty} (x; b) = min_{0 \leq i \leq p} | x_{i} | and m_{\infty} (x; b) = max_{0 \leq i \leq p} | x_{i} | .

The case

r = - 1

yields the weighted harmonic mean, defined as

m_{- 1} (x; b) = {(\sum_{i = 0}^{p} \frac{b_{i}}{| x_{i} |})}^{- 1} .

We introduce the Generalised Weighted Mean Constrained (GWMC) estimation framework and note that an equivalent formulation in the context of portfolio theory is discussed in (Asimit et al., forthcoming-a, 2025). The GWMC approach seeks to minimise a given loss function

L

subject to a constraint on the GWM of the regression parameters:

\begin{matrix} \{\begin{matrix} min_{θ \in R^{p + 1}} & L (θ) \\ s . t . & m_{r} (θ; b) \leq ϵ, for r \geq 1, \\ m_{r} (θ; b) \geq ϵ, for r < 1, \end{matrix} \end{matrix}

(29)

where

ϵ > 0

is a fixed constant. The GWM function

m_{r} (θ; b)

provides flexibility in regularising the parameters through the choice of the order r and weighting vector

b

.

Note that

m_{r} (θ; b)

is convex in

θ \in K_{p + 1} (δ)

when

r \geq 1

, and therefore encompasses a broad class of regularisation schemes. In particular, RR and LASSO arise as special cases of (29) when the loss functional

L

corresponds to a regression setting. Specifically, RR can be formulated within the GWMC framework with

r = 2

and

θ \in K_{p + 1} (δ)

:

m_{2} (θ; b) = {(\sum_{k = 0}^{p} b_{k} {| θ_{k} |}^{2})}^{\frac{1}{2}} \leq \sqrt{\tilde{λ}},

where equal weights are assumed. This formulation coincides with the usual

L_{2}

-norm constraint in (6). Similarly, LASSO is representable within the GWMC framework with

r = 1

, which imposes an

L_{1}

-norm constraint on

θ \in K_{p + 1} (δ)

. In this case, the GWMC constraint becomes

m_{1} (θ; b) = \sum_{k = 0}^{p} b_{k} | θ_{k} | \leq \tilde{t} .

The Parity Estimator introduced in (11) and formulated in (12) can be viewed as a limiting case of the GWMC framework in (29) with

r = 0

and

θ \in K_{q} (δ)

, which leads to a logarithmic constraint. Specifically, for equal weights

b_{k} = \frac{1}{q}

, the constraint becomes

\begin{matrix} m_{0} (θ; b) & : = lim_{r \to 0} m_{r} (θ; b) \\ = exp (\sum_{i = 1}^{q} b_{i} log | θ_{i} |) \geq e^{μ}, \end{matrix}

where

μ

is a lower-bound parameter. Similarly, the Partial Parity Estimator defined in (13) and reformulated in (14) can also be interpreted within the GWMC framework with

r = 0

and

θ \in K_{q} (δ)

. In this case, the logarithmic constraint takes the weighted form

m_{0} (θ; b) : = lim_{r \to 0} m_{r} (θ; b) = exp (\sum_{i = 1}^{q_{0}} b_{i} log | θ_{i} | + \sum_{i = q_{0} + 1}^{q} b_{i} log | θ_{i} |) \geq e^{\tilde{μ}},

where

\tilde{μ}

is the corresponding lower-bound parameter. The weights

b_{k}

are defined as

b_{k} = \{\begin{matrix} \frac{1}{q_{0} + 1^{⊤} t}, & for k = 1, \dots, q_{0}, \\ \frac{t_{k}}{q_{0} + 1^{⊤} t}, & for k = q_{0} + 1, \dots, q . \end{matrix}

In summary, the logarithmic constraint underlying parity estimation produces a balanced regularisation effect, distributing the elasticity of the loss function uniformly across the selected parameters. When embedded within the GWMC framework, the parity estimator is conceptually consistent with penalised regression methods such as RR and LASSO, which correspond to the cases where

r = 2

and

r = 1

, respectively. The fundamental similarity between RR, LASSO and PR lies in their shared objective, to control both the magnitude and distribution of model parameters by imposing constraints on the weighted mean functional. This connection highlights how PR extends the classical regularisation principle by introducing fairness-oriented constraints within a unified GWMC framework.

3. Simulation Study

In this section, we conduct a simulation study to evaluate the finite-sample performance of the PR estimators relative to their primary competitors: (i) OLS as defined in (2), (ii) RR as defined in (5), and (iii) the Liu estimator as defined in (9). Our objective is to assess the robustness of the PR framework under varying degrees of multicollinearity and heteroscedasticity. We exclude the LASSO estimator from the analysis, as our focus is on dense parameter space structures and variable selection is not the purpose of this paper.

This section is organised into two parts. Section 3.1 describes the simulation design and data-generating process, but it also introduces the performance measures used to compare the estimators. Section 3.2 presents and discusses the simulation results.

3.1. Experimental Setup and Methodology

To evaluate the performance of the estimator under different data structures, we employ a data generation process (DGP) designed to simulate complex regression environments characterised by severe multicollinearity and heteroscedasticity. A detailed characterisation of the simulation steps, including the construction of the feature matrix

X \in R^{n \times p}

and the heteroscedastic response variable Y, is provided in Appendix B.

We compare several distinct estimation methodologies. The OLS estimator serves as the unregularised baseline, while the shrinkage estimators, namely RR and Liu, are also included. The PR framework is implemented in two computationally efficient variants, which differ primarily in their tuning mechanisms. First, the PR estimator with t-tuning, denoted by

{PR}_{t}

, is directly motivated by Theorem 2. Its key feature is that, for any fixed

t \geq 0

, the PR estimate is independent of the normalising constant

μ

. Here, t acts as the relative elasticity weight for the target variable and is selected via CV. Second, the PR estimator with c-tuning, denoted by

{PR}_{c}

, allocates a fixed loss contribution to each predictor. Mathematically, this corresponds to the budget-based objective function, where the logarithmic barrier for the p predictors is weighted by c, while the response variable’s contribution is determined by

1 - (p + 1) c

. To ensure positive risk allocations, c is constrained such that

0 \leq c < 1 / (p + 1)

. Although c and t are functionally connected through their role in balancing loss contributions,

{PR}_{c}

operates within a strictly bounded domain, thereby providing an alternative numerical approach to the risk-parity problem. We benchmark both

{PR}_{t}

and

{PR}_{c}

against OLS, RR, and the Liu estimator, defining the initial search cone for the PR algorithms using the signs of the OLS coefficients, as indicated by Proposition 2. All tuning parameters (c, t,

λ

, or d) are selected via 10-fold CV.

To evaluate the accuracy of the proposed estimators in the simulation study, we measure their performance using the

L_{2}

-distance between the true regression parameter vector

β

and the estimated vector

\hat{β}

. This corresponds to the estimated MSE of

β

, which is appropriate since low estimation error aligns with low theoretical MSE. For a single simulation run, the

L_{2}

-error is defined as

\begin{matrix} L_{2} = {∥ \hat{β} - β ∥}_{2} = \sqrt{\sum_{j = 1}^{m} {({\hat{β}}_{j} - β_{j})}^{2}}, \end{matrix}

(30)

where m denotes the number of covariates, including the intercept.

We report the average

L_{2}

-distance across

N = 1000

repetitions for each scenario. To assess the inherent variability and spread of these distances, we also report the standard deviation (SD). Following the convention in our numerical results, the SD is expressed as a percentage to facilitate comparison of the performance of different estimators and is calculated as follows:

SD = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(L_{2}^{(i)} - {\bar{L}}_{2})}^{2}}

(31)

where

L_{2}^{(i)}

represents the

L_{2}

-distance in the i-th simulation repetition, and

{\bar{L}}_{2}

is the sample mean of these distances across all N repetitions. A lower average

L_{2}

-distance indicates greater accuracy in estimating the true regression parameters.

3.2. Discussion of Simulation Results

We provide a comprehensive discussion of the comparative performance of the two PR variants (

{PR}_{c}

and

{PR}_{t}

) relative to the OLS, RR, and the Liu estimators in Table 1 and Table 2. The detailed results are reported in Table 1, while the aggregated results are summarised in Table 2 to facilitate identification of the main trends observed in Table 1.

Overall, the simulation results demonstrate that PR consistently outperforms OLS and the shrinkage estimators by effectively stabilising parameter estimates in high-correlation environments. While OLS exhibits substantial variance inflation as

| ρ |

increases, both

{PR}_{c}

and

{PR}_{t}

achieve their strongest performance under high negative correlation (

ρ = - 0.75

and

- 0.5

), frequently attaining the lowest

L_{2}

-distances across all panels. Even in settings with strong positive correlation, PR remains superior or highly competitive relative to OLS. Moreover, as dimensionality increases (Panel C,

m = 25

),

{PR}_{c}

emerges as the dominant estimator across nearly all correlation levels. These findings confirm that the parity constraint provides an effective regularisation mechanism for high-dimensional models characterised by dense parameter structures and strong interdependence among features. It is important to note that when the number of covariates is very low (

m = 2

) and the sample size is small (

n / m = 10

and 25), the PR methods sometimes perform worse or similarly to traditional shrinkage methods. In such low-dimensional environments, the risk-balancing mechanism of PR provides less marginal benefit. However, as the dimensionality and sample size increase, the structural advantages of PR become highly evident. While PR consistently outperforms OLS, it is worth noting that the traditional shrinkage estimators (RR and Liu) perform quite similarly to the PR framework in scenarios with a lower number of covariates or moderate volatility. The true divergence in performance occurs in high-dimensional settings (e.g.,

m = 25

), where the parity constraints prevent the overshrinkage or instability that affects RR and Liu.

Table 2 provides a synthesised overview of the results reported in Table 1 by aggregating the “best” and “second-best” performances across all 60 scenarios. The summary highlights the consistency of the PR framework: the combined PR approach is the top-performing model in 41.7% of cases and ranks as either the best or second-best estimator in 43.3% of the scenarios considered (52 out of 120).

In summary, the simulation results indicate that the PR estimator is comparable to the benchmark model and performs exceptionally well in settings characterised by severe multicollinearity and high dimensionality. By imposing structural balance on the coefficients via parity constraints, the proposed method achieves a superior bias–variance trade-off compared to traditional shrinkage estimators across various dependency structures.

4. Real Data Analysis

We evaluate the empirical performance of the PR estimators (

{PR}_{c}

,

{PR}_{t}

) using data from the West Texas Intermediate (WTI) and Brent crude oil markets. These commodities serve as the primary global benchmarks for oil pricing and are characterised by high volatility and frequent structural shifts. This environment provides a setting to assess the stability of the parity framework relative to OLS, RR, and the Liu estimator across diverse market regimes. Section 4.1 details the dataset and the underlying factor model. Section 4.2 outlines the OOS evaluation structure and corresponding error metrics before presenting the empirical results.

4.1. Background and Dataset Description

Our analysis utilises monthly total returns for WTI and Brent crude oil sourced from Bloomberg. Although the raw data begin in January 1980, the necessity of observing all factors concurrently restricts our final sample periods to: (i) WTI, August 1988 to September 2024 (434 observations); and (ii) Brent, April 1998 to September 2024 (318 observations). A detailed technical discussion of benchmark selection and data source characteristics is provided in Appendix C.1.

Following the methodology of Sakkas and Tessaromatis (2020), we model monthly excess returns using a factor-based specification. The model is given by

y_{t + 1} = β_{0} + \sum_{j = 1}^{p} β_{j} X_{j, t} + ϵ_{t + 1},

(32)

where

y_{t + 1}

denotes the monthly excess return (over the risk-free rate) at time

t + 1

, and

X_{j, t}

represents the j-th normalised factor observed at time t.

For WTI, the model includes nine factors (

p = 9

): Momentum, Basis, Basis Momentum, Skewness, Inflation Beta, Volatility, Hedging Pressure, Open Interest, and Value. For Brent, Hedging Pressure is excluded due to data limitations, resulting in

p = 8

. To guarantee numerical stability and comparability across variables measured in different units, all covariates are uniformly scaled to the interval

[0, 1]

, whereas the response variable y remains unscaled. Definitions and economic motivations for each factor are provided in Appendix C.2, and the explicit construction formulas follow the framework established in Sakkas and Tessaromatis (2020).

Figure A1 and Figure A2 present the monthly returns and cumulative performance of both benchmarks. To account for market shifts over time, we apply the endogenous structural breakpoint test of Bai and Perron (2003) to divide the entire sample into distinct economic periods. This procedure identifies 12 regimes for WTI and 11 for Brent, including major high-volatility episodes such as: (i) the Global Financial Crisis and subsequent recovery, spanning September 2008 to May 2011 for WTI and September 2008 to January 2011 for Brent; and (ii) the COVID-19 pandemic and its aftermath, covering March 2020 to July 2022 for WTI and February 2020 to July 2022 for Brent. For a complete timeline of these identified periods, along with the specific market events that define them, please see the Appendix C.3.

4.2. Data Analysis

The OOS performance of all estimators is evaluated using an expanding training-window framework. This approach is anchored to the structural regimes identified in Section 4.1 and further detailed in Appendix C.3. For each transition from Period i to Period

i + 1

, the five regression models are estimated using all available data up to the end of Period i. Their predictive performance is then evaluated over the testing window corresponding to the entirety of Period

i + 1

.

Given a testing window of length n, observed excess returns

y_{t}

, and corresponding predictions

{\hat{y}}_{t}

, OOS performance is assessed using three standard metrics:

\begin{matrix} RMSE = \sqrt{MSE}, MAE = \frac{1}{n} \sum_{t = 1}^{n} | y_{t} - {\hat{y}}_{t} | . \end{matrix}

(33)

Root Mean Squared Error (RMSE) imposes a quadratic loss, thereby penalising large forecast errors more heavily and highlighting model instability during market shocks. In contrast, the Mean Absolute Error (MAE) applies a linear loss and is therefore more robust to extreme observations. While OLS and shrinkage estimators may experience variance inflation during turbulent market regimes, the PR framework incorporates a parity-based constraint designed to enhance predictive stability. A detailed discussion of the comparative stability of PR relative to penalised methods across these regimes is provided in Appendix C.4.

Before evaluating the OOS forecasting performance, it is important to note the dependence structure among the covariates. An analysis of the overall correlation matrices provided in Appendix C.5 reveals moderate to high multicollinearity among several factors in both the WTI and Brent datasets. This interrelated parameter structure confirms the necessity of employing regularised estimators, including RR, Liu, and PR to stabilise the coefficient estimates and prevent the variance inflation that can degrade the OOS performance of OLS in such conditions.

Table 3 and Table 4 present the OOS predictive performance for WTI and Brent crude oil excess returns across 12 and 11 distinct economic periods, respectively. Although no single estimator consistently outperforms across all periods, the results demonstrate that our PR estimators provide greater stability during regimes of extreme market volatility and structural shifts, where traditional benchmarks often exhibit pronounced instability.

For the WTI dataset, the summary rows indicate that

{PR}_{c}

attains the highest number of best and second-best RMSE performances across the evaluated periods. Examining specific high-volatility events, the performance of the estimators varies. During the Global Financial Crisis (Period 6), all methods struggle, with OLS achieving the lowest RMSE of 38.94%; however,

{PR}_{c}

secures the second-best overall position, notably outperforming the other shrinkage estimators. This illustrates that even when standard shrinkage methods falter, the parity constraint enhances predictive robustness. In the subsequent recovery phase (Period 7),

{PR}_{t}

attains the lowest RMSE of 11.27%, outperforming all other competitors. During the COVID-19 pandemic (Period 11), the parity-based framework demonstrates structural resilience:

{PR}_{c}

achieves the minimum RMSE of 28.51%, while

{PR}_{t}

ranks second-best at 29.00%, whereas RR, Liu, and OLS fail to maintain comparable stability.

A similar pattern is observed for the Brent dataset, where the summary counts indicate that

{PR}_{c}

again secures the highest number of best RMSE performances, while

{PR}_{t}

achieves the most second-best rankings. During the Financial Crisis (Period 5),

{PR}_{c}

emerges as the most robust estimator, delivering the lowest RMSE of 66.33%. In contrast, traditional shrinkage methods such as RR and Liu suffer from substantial errors, performing even worse than the second-best OLS. During the subsequent recovery phase (Period 6), RR attains the lowest RMSE at 22.98%, with

{PR}_{c}

closely following as the second-best model at 26.13%. Most notably, during the extreme volatility of the COVID-19 pandemic (Period 10),

{PR}_{c}

proves highly reliable, achieving a remarkably low RMSE of 28.44%. In this extreme regime, conventional methods fail: OLS produces an RMSE of 163.04%, while RR and Liu incur errors of 61.51% and 72.91%, respectively.

Consistent with our findings from the simulation study, the relative performance of

{PR}_{t}

and

{PR}_{c}

varies across market regimes. While

{PR}_{t}

exhibits particular strength in capturing trends during the WTI recovery phase,

{PR}_{c}

demonstrates superior robustness in the most volatile and ill-conditioned periods, such as the Financial Crisis and the COVID-19 pandemic, across both commodities. Furthermore, the empirical results confirm our previous observations regarding RR and Liu. During relatively stable market regimes, RR and Liu perform quite competitively and yield similar predictive accuracy to PR. However, during the aforementioned periods of extreme volatility and structural breaks, these estimators frequently become unstable due to their reliance on a single global penalty parameter. By evenly distributing the risk equilibrium, the PR prevents overshrinkage and maintains robust OOS predictions even in these chaotic environments. Overall, the results presented in Table 3 and Table 4 underscore that parity-based regression provides a critical mechanism for ensuring parameter stability, rendering the PR framework a more reliable alternative to conventional shrinkage methods, such as RR or Liu, when forecasting oil returns under severe global shocks.

5. Conclusions

We have introduced Parity Regression, a novel multiple linear regression framework that replaces aggregate error minimisation with an elasticity-based principle that distributes prediction error evenly across model parameters. This formulation induces a structurally balanced regularisation mechanism that is particularly well-suited to noisy environments, including time series settings characterised by structural shifts and evolving dynamics. We provided a rigorous theoretical characterisation of the new estimator, establishing its existence, uniqueness, and structural properties, and demonstrated that it can be embedded within a unified Generalised Weighted Mean Constrained framework that also encompasses classical penalised and shrinkage estimators.

By reinterpreting regularisation through elasticity balancing rather than norm penalisation alone, Parity Regression offers a conceptually distinct yet mathematically coherent extension of existing methodology. Theoretical guarantees are corroborated by simulation studies and real-data applications, which confirm its stability and competitive performance. Collectively, these results position Parity Regression as a substantive methodological advancement with strong foundations for further analytical and high-dimensional development.

Author Contributions

Conceptualization, V.A. and Z.C.; Methodology, V.A., Z.C. and P.M.; Software, Z.C. and B.I.; Validation, V.A., B.I. and P.M.; Formal analysis, V.A.; Investigation, B.I.; Resources, B.I.; Data curation, Z.C.; Writing – original draft, V.A., Z.C. and P.M.; Visualization, Z.C.; Supervision, V.A. and P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to proprietary licensing restrictions from the third-party provider (Bloomberg Finance L.P.). The data were accessed via a university institutional terminal, which prohibits the redistribution of raw data to the public.

Acknowledgments

The authors express their sincere gratitude to Alexandru Bădescu (University of Calgary) for his valuable comments and constructive guidance throughout the theoretical development and implementation phases of this research. His insights have materially contributed to the rigour and clarity of the final manuscript. The third author would like to thank the Informational Buildup Foundation (IBF) and the Simion Stoilow Institute of Mathematics of the Romanian Academy for the support provided through an IBF Research Fellowship.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs

Appendix A.1. Proof of Theorem 1

We first prove part (i). The first-order necessary conditions for any solution

θ^{★}

of (12) leads to

\begin{matrix} θ_{k}^{★} \frac{\partial L}{\partial θ_{k}} (θ^{★}) = μ, for all 1 \leq k \leq q, \end{matrix}

which further implies that

θ^{★}

satisfies (11).

For part (ii), we first demonstrate that (12) has a solution. Let

f (θ; μ)

be the objective function of (12). We begin by showing that

\begin{matrix} lim_{{∥ θ ∥}_{\infty} \to \infty} f (θ; μ) = + \infty for any μ > 0 . \end{matrix}

(A1)

Indeed,

\begin{matrix} f (θ; μ) & = L (θ) - μ \sum_{k = 1}^{q} log (δ_{k} θ_{k}) \\ \geq M_{1} {∥ θ ∥}_{\infty} - μ \sum_{k = 1}^{q} log (\frac{δ_{k} θ_{k}}{{∥ θ ∥}_{\infty}}) - μ q log {∥ θ ∥}_{\infty} for {∥ θ ∥}_{\infty} is sufficiently large \\ \geq M_{1} {∥ θ ∥}_{\infty} - μ q log {∥ θ ∥}_{\infty} \\ \to {\infty as ∥ θ ∥}_{\infty} \to \infty . \end{matrix}

Next, we show that

\begin{matrix} lim_{{∥ θ ∥}_{- \infty} \to 0} f (θ; μ) = + \infty, for any μ > 0, \end{matrix}

where

{∥ θ ∥}_{- \infty} = {min}_{i} | θ_{i} |

. This follows from

lim_{{∥ θ ∥}_{- \infty} \to 0} \sum_{k = 1}^{q} log (δ_{k} θ_{k}) = - \infty

and the fact that

L (θ) > 0

for any

θ \in K_{q} (δ)

. Thus, there exists

ϵ > 0

and

K > 0

such that

\begin{matrix} inf_{θ \in K_{q} (δ)} f (θ; μ) = inf_{θ \in B_{ϵ, K}} f (θ; μ), \end{matrix}

where

B_{ϵ, K} = {θ \in K_{q} {(δ) : ∥ θ ∥}_{- \infty} \geq {ϵ, ∥ θ ∥}_{\infty} \leq K} .

The conclusion follows since

B_{ϵ, K}

is compact and f is continuous.

Finally, note that

f (θ; μ)

is strictly convex in

θ

since

μ > 0

,

L

is convex and

log (δ_{k} θ_{k})

is strictly concave in

θ_{k}

for all k. Hence, (12) has a unique solution.

For part (iii), we begin by proving that (12) admits a unique solution. Since the proof is the same as part (ii) except for (A1), we only show that (A1) holds. Since

L

is positive and homogeneous of order

τ

, then for any

0 < ζ < τ

, we have that

L (θ) \geq {∥ θ ∥}_{\infty}^{τ - ζ}

for those

θ

such that

{∥ θ ∥}_{\infty}

is sufficiently large. Thus,

\begin{matrix} f (θ; μ) & = L (θ) - μ \sum_{k = 1}^{q} log (δ_{k} θ_{k}) \\ \geq {C ∥ θ ∥}_{\infty}^{τ} - μ q log {∥ θ ∥}_{\infty} for {∥ θ ∥}_{\infty} is sufficiently large \\ \to {\infty as ∥ θ ∥}_{\infty} \to \infty, \end{matrix}

(A2)

which gives the needed result.

We now show that

L (θ^{★} (μ)) = \frac{q μ}{τ}

for all

μ > 0

. Since

θ^{★} (μ)

is a solution of (12), the first-order conditions give

θ_{k} \frac{\partial L (θ)}{\partial θ_{k}} = μ |_{θ = θ^{★} (μ)} for any 1 \leq k \leq q

The latter and Euler’s Homogeneous Function Theorem yield

L (θ^{★} (μ)) = q μ / τ

.

Next, we show that

θ^{★} (μ) = μ^{1 / τ} θ^{★} (1)

for all

μ > 0

. The homogeneity of

L

implies that

\begin{matrix} f (μ^{- 1 / τ} θ; 1) = \frac{1}{μ} f (θ; μ) + \frac{q}{τ} log μ for any θ \in K_{q} (δ) and μ > 0, \end{matrix}

which further leads to

\begin{matrix} \underset{θ}{argmin} f (μ^{- 1 / τ} θ; 1) = \underset{θ}{argmin} f (θ; μ) = θ^{★} (μ), \end{matrix}

as

μ > 0

. It can easily be seen that

\begin{matrix} μ^{\frac{1}{τ}} θ^{★} (1) = \underset{θ}{argmin} f (μ^{- 1 / τ} θ; 1) \end{matrix}

which shows that

θ^{★} (μ) = μ^{1 / τ} θ^{★} (1)

.

Lastly, we prove the final statement of part (iii). If

\tilde{θ} = μ_{0}^{1 / τ} θ^{★} (1)

for some

μ_{0} > 0

then

\tilde{θ} = θ^{★} (μ_{0})

and by part (i),

\tilde{θ}

is a parity estimator. Conversely, assume that

\tilde{θ}

is a parity estimator so that there exists

μ_{0} > 0

such that

\begin{matrix} θ_{k} \frac{\partial L (θ)}{\partial θ_{k}} |_{θ = \tilde{θ}} = μ_{0} for all 1 \leq k \leq q . \end{matrix}

(A3)

Euler’s Theorem gives

L (\tilde{θ}) = q μ_{0} / τ = L (θ (μ_{0}))

. Thus,

\tilde{θ} = θ (μ_{0}) = μ_{0}^{1 / τ} θ^{★} (1)

by the uniqueness of the solution of (12). The proof is now complete.

Appendix A.2. Proof of Theorem 2

We begin by proving part (i). Based on the proof of Theorem 1, the optimisation problem in (20) has a unique solution

{({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤}

for any parameters

(λ, t, μ)

satisfying

λ \geq 0

,

t \geq 0

, and

μ > 0

. This solution also fulfils the parity conditions specified in (11). Define

{RRSSC}_{k}

for each k as follows

{RRSSC}_{k} (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ); λ) = θ_{k} \frac{\partial RRSS (θ (λ, t, μ), θ_{p + 1} (λ, t, μ); λ)}{\partial θ_{k}} |_{θ = θ^{★} (λ, t, μ)},

for all

0 \leq k \leq p + 1

. Since

{({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤}

is a unique solution, it must satisfy the stationary conditions

\begin{matrix} \{\begin{matrix} {RRSSC}_{k} (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ); λ) = μ, & for k = 0, \dots, p, \\ {RRSSC}_{k} (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ); λ) = μ t, & for k = p + 1 . \end{matrix} \end{matrix}

(A4)

Applying Euler’s homogeneous function theorem, and

RRSS

being a homogeneous function of order

τ = 2

, we can express

\begin{matrix} RRSS (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ); λ) & = \frac{1}{2} \sum_{k = 0}^{p + 1} {RRSSC}_{k} (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ); λ) \\ = \frac{1}{2} (\sum_{k = 0}^{p} μ + μ t) = \frac{(p + 1 + t) μ}{2} . \end{matrix}

We find that for each

0 \leq k \leq p

,

{RRSSC}_{k} (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ); λ) = (\frac{2}{p + 1 + t}) RRSS (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ); λ) .

Since

RRSS (θ, θ_{p + 1}; λ)

is homogeneous,

{RRSSC}_{k}

is also homogeneous of the same order, and in turn, the following holds for all

0 \leq k \leq p

and any

m > 0

{RRSSC}_{k} (m (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ)); λ) = \frac{2 RRSS (m (θ^{★} (λ, t, μ), θ_{p + 1}^{★} (λ, t, μ)); λ)}{p + 1 + t} .

(A5)

Setting

m = 1 / θ_{p + 1}^{★} (λ, t, μ)

, we conclude that

{({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤} / θ_{p + 1}^{★} (λ, t, μ)

is the unique PR estimate as defined in (21) within

K_{p + 2} (δ)

, thereby satisfying the parity estimator in (13).

To show that

{({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤} / θ_{p + 1}^{★} (λ, t, μ)

is constant with respect to

μ > 0

for any given

(λ, t)

, consider the objective function of (20), denoted by

H (θ, θ_{p + 1}; λ, t, μ)

. We find that

H (μ^{- 1 / 2} θ, μ^{- 1 / 2} θ_{p + 1}; λ, t, 1) = \frac{1}{μ} H (θ, θ_{p + 1}; λ, t, μ) - (\frac{p + 1 + t}{2}) log μ,

which implies

\underset{(θ, θ_{p + 1})}{argmin} H (μ^{- 1 / 2} θ, μ^{- 1 / 2} θ_{p + 1}; λ, t, 1) = \underset{(θ, θ_{p + 1})}{argmin} H (θ, θ_{p + 1}; λ, t, μ),

(A6)

yielding a unique solution

{(θ^{★} {(λ, t, μ)}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤}

for

μ > 0

and

t \geq 0

. Since

\underset{(y, y_{p + 1})}{argmin} f (y, y_{p + 1}; λ, t, 1) = {(θ^{★} {(λ, t, 1)}^{⊤}, θ_{p + 1}^{★} (λ, t, 1))}^{⊤},

where

{(θ^{★} {(λ, t, 1)}^{⊤}, θ_{p + 1}^{★} (λ, t, 1))}^{⊤}

is the optimal solution in (20) with

\tilde{μ} = 1

. Together with (A6), we obtain

\begin{matrix} {({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤} = μ^{1 / 2} {({(θ^{★} (λ, t, 1))}^{⊤}, θ_{p + 1}^{★} (λ, t, 1))}^{⊤} for any μ > 0, \end{matrix}

(A7)

Thus,

\begin{matrix} \frac{{({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤}}{θ_{p + 1}^{★} (λ, t, μ)} = \frac{{({(θ^{★} (λ, t, 1))}^{⊤}, θ_{p + 1}^{★} (λ, t, 1))}^{⊤}}{θ_{p + 1}^{★} (λ, t, 1)} for any μ > 0, \end{matrix}

(A8)

showing that our PR estimates do not depend on the normalising constant

μ

. Choosing

μ^{★} > 0

such that

θ_{p + 1}^{★} (λ, t, μ^{★}) = 1

, then (A7) leads to the required result displayed in (23). Furthermore, (A7) and (A8) yield

{(θ^{★} {(λ, t, μ^{★})}^{⊤}, 1)}^{⊤} = {(μ^{★})}^{1 / 2} {({(θ^{★} (λ, t, 1))}^{⊤}, θ_{p + 1}^{★} (λ, t, 1))}^{⊤} = \frac{{({(θ^{★} (λ, t, 1))}^{⊤}, θ_{p + 1}^{★} (λ, t, 1))}^{⊤}}{θ_{p + 1}^{★} (λ, t, 1)},

which concludes that

μ^{★} = {(θ_{p + 1}^{★} (λ, t, 1))}^{- 2}

. The rest of the proof of part (i) is straightforward, since it relies on arguments similar to those above.

Next, we show the proof of part (ii). We begin by noting that (24) is a convex optimisation problem because the objective function

RRSS (θ, θ_{p + 1}; λ)

is strictly convex in

(θ, θ_{p + 1})

. To establish the existence of a solution, we observe that: (i) the objective function grows unboundedly near infinity (as shown in the proof of Theorem 1), and (ii) any point

(θ, θ_{p + 1}) \in K_{p + 2} (δ)

becomes infeasible when

∥ θ_{k} ∥_{- \infty} \leq ϵ

for sufficiently small

ϵ

, since

lim_{t \to 0^{+}} log t = - \infty

. This implies that the feasible set

K_{p + 2} (δ)

excludes boundary points where any

θ_{k}

approaches zero, effectively restricting the solution to a bounded subset of

K_{p + 2} (δ)

. Given this compactness and the strict convexity of

RRSS (θ, θ_{p + 1}; λ)

, a solution is guaranteed to exist.

We now proceed to show that (24) admits a unique solution, which requires two main steps.

First, assume that

(z^{* T}, z_{p + 1}^{★})

is an optimal solution of (24) for which the inequality constraint in (24) is binding (becomes an identity). If the constraint would not have been binding, we define

κ^{★} = exp \{\tilde{μ} - \sum_{k = 0}^{p} log (δ_{k} z_{k}^{★}) - t log (δ_{p + 1} z_{p + 1}^{★})\},

and note that

0 < κ^{★} < 1

, as the constraint would be non-binding at

(z^{* ⊤}, z_{p + 1}^{★})

. Then,

RRSS (κ^{★} z^{★}, κ^{★} z_{p + 1}^{★}; λ) = {(κ^{★})}^{2} RRSS (z^{★}, z_{p + 1}^{★}; λ) < RRSS (z^{★}, z_{p + 1}^{★}; λ),

where the first equality is due to the homogeneity of order 2 of

RRSS (\cdot; λ)

on

K_{p + 2} (δ)

, and the inequality follows from

0 < κ^{★} < 1

, (17), and (18). This contradicts our assumption that

(z^{* T}, z_{p + 1}^{★})

is a solution of (24), implying that any solution of (24) must satisfy the constraint as an identity.

Next, assume that there exist two distinct solutions,

(z^{* T}, z_{p + 1}^{★})

and

(z^{* * T}, z_{p + 1}^{★ ★})

, for a given tuple

(λ, t, \tilde{μ})

. Define

(z^{★ ★ ★}, z_{p + 1}^{★ ★ ★}) = γ (z^{* T}, z_{p + 1}^{★}) + (1 - γ) (z^{* * T}, z_{p + 1}^{★ ★}) where 0 < γ < 1 .

Since (24) is a convex problem,

(z^{★ ★ ★}, z_{p + 1}^{★ ★ ★})

solves (24). Now, we find

\begin{matrix} \sum_{k = 0}^{p} log (δ_{k} z_{k}^{★ ★ ★}) + t log (δ_{p + 1} z_{p + 1}^{★ ★ ★}) \\ > γ (\sum_{k = 0}^{p} log (δ_{k} z_{k}^{★}) + t log (δ_{p + 1} z_{p + 1}^{★})) + (1 - γ) (\sum_{k = 0}^{p} log (δ_{k} z_{k}^{★ ★}) + t log (δ_{p + 1} z_{p + 1}^{★ ★})) \\ = \tilde{μ}, \end{matrix}

(A9)

since

log (\cdot)

is strictly concave on

R_{+}^{★}

and both

(z^{* T}, z_{p + 1}^{★})

and

(z^{* * T}, z_{p + 1}^{★ ★})

satisfy the constraint as an identity. However, since

(z^{★ ★ ★}, z_{p + 1}^{★ ★ ★})

solves (24), it must also satisfy the constraint as an identity, leading to a contradiction with (A9). This completes the proof of uniqueness for (24).

Further, by Euler’s homogeneous function theorem and the Karush–Kuhn–Tucker (KKT) conditions, we have

{RRSSC}_{k} (θ^{★ ★} (λ, t, \tilde{μ}), θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}); λ) = (\frac{2}{p + 1 + t}) RRSS (θ^{★ ★} (λ, t, \tilde{μ}), θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}); λ),

which, together with the equivalent variant of (A5), implies that

{({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤} / θ_{p + 1}^{★ ★} (λ, t, \tilde{μ})

is the unique PR estimate as defined in (25) within

K_{p + 2} (δ)

, fulfilling the condition in (13).

We now demonstrate that

{({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤} / θ_{p + 1}^{★ ★} (λ, t, \tilde{μ})

is constant with respect to

\tilde{μ} \in R

for any fixed

(λ, t)

. For any tuple

(λ, t, \tilde{μ})

, the unique solution of (24), denoted by

{({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤}

, is equivalent to solving

\begin{matrix} \{\begin{matrix} min_{(θ, θ_{p + 1}) \in K_{p + 2} (δ)} & RRSS (e^{\frac{\tilde{μ}}{p + 1 + t}} θ, e^{\frac{\tilde{μ}}{p + 1 + t}} θ_{p + 1}; λ) \\ s . t . & \sum_{k = 0}^{p} log (δ_{k} e^{\frac{\tilde{μ}}{p + 1 + t}} θ_{k}) + t log (δ_{p + 1} e^{\frac{\tilde{μ}}{p + 1 + t}} θ_{p + 1}) \geq 0 \end{matrix} \end{matrix}

(A10)

This problem admits a unique solution

{(e^{\frac{{\tilde{μ}}^{★}}{p + 1 + t}} θ^{★ ★} {(λ, t, 0)}^{⊤}, e^{\frac{{\tilde{μ}}^{★}}{p + 1 + t}} θ_{p + 1}^{★ ★} (λ, t, 0))}^{⊤}

for

\tilde{μ} = 0

and

t \geq 0

. The uniqueness of the solution for (24) and (A10) implies that

\begin{matrix} {({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤} = e^{\frac{{\tilde{μ}}^{★}}{p + 1 + t}} {({(θ^{★ ★} (λ, t, 0))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, 0))}^{⊤} for all \tilde{μ} \in R . \end{matrix}

(A11)

Thus,

\begin{matrix} \frac{{({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤}}{θ_{p + 1}^{★ ★} (λ, t, \tilde{μ})} = \frac{{({(θ^{★ ★} (λ, t, 0))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, 0))}^{⊤}}{θ_{p + 1}^{★ ★} (λ, t, 0)} for all \tilde{μ} \in R . \end{matrix}

(A12)

This confirms that PR estimates are independent of the normalising constant

\tilde{μ}

. By selecting

{\tilde{μ}}^{★}

so that

θ_{p + 1}^{★ ★} (λ, t, {\tilde{μ}}^{★}) = 1

, we arrive at the desired result in (26), which is a straightforward consequence of (A11). Moreover, (A11) and (A12) give that

{(θ^{★ ★} {(λ, t, {\tilde{μ}}^{★})}^{⊤}, 0)}^{⊤} = e^{\frac{{\tilde{μ}}^{★}}{p + 1 + t}} {({(θ^{★ ★} (λ, t, 0))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, 0))}^{⊤} = \frac{{({(θ^{★ ★} (λ, t, 0))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, 0))}^{⊤}}{θ_{p + 1}^{★ ★} (λ, t, 0)},

which concludes that

{\tilde{μ}}^{★} = - (p + 1 + t) log (θ_{p + 1}^{★ ★} (λ, t, 0))

. Further, we note that the Slater’s condition is satisfied in (24), and therefore, the strong duality in (24) holds.

Finally, we proceed with the proof of part (iii). Using the notation introduced in parts (i) and (ii), recall that the PR estimates in (21) and (25) satisfy

\begin{matrix} {\hat{β}}^{P R} (λ, t) = \frac{θ^{★} (λ, t, μ)}{θ_{p + 1}^{★} (λ, t, μ)} and {\hat{\hat{β}}}^{P R} (λ, t) = \frac{θ^{★ ★} (λ, t, \tilde{μ})}{θ_{p + 1}^{★ ★} (λ, t, \tilde{μ})} \end{matrix}

(A13)

for any

μ > 0

and

\tilde{μ} \in R

. Since strong duality holds in (24), let

γ^{★}

be the dual optimal multiplier in (24) associated with the logarithmic constraint

\sum_{k = 0}^{p} log (δ_{k} θ_{k}) + t log (θ_{p + 1}) \geq \tilde{μ} .

Then, (24) is equivalent to minimising the Lagrangian

\begin{matrix} L (θ, θ_{p + 1}; λ, t, \tilde{μ}; γ) = RRSS (θ, θ_{p + 1}; λ) - γ (\sum_{k = 0}^{p} log (δ_{k} θ_{k}) + t log (δ_{p + 1} θ_{p + 1}) - \tilde{μ}), \end{matrix}

(A14)

and the KKT conditions imply that

γ = γ^{★}

, ensuring that the constraint is active, i.e.,

\sum_{k = 0}^{p} log (δ_{k} θ_{k}^{★ ★}) + t log (δ_{p + 1} θ_{p + 1}^{★ ★}) = \tilde{μ} .

Furthermore, the stationarity conditions for (A14) yield

\begin{matrix} \{\begin{matrix} {RRSSC}_{k} (θ^{★ ★} (λ, t, \tilde{μ}), θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}); λ) = γ^{★}, & for k = 0, \dots, p, \\ {RRSSC}_{k} (θ^{★ ★} (λ, t, \tilde{μ}), θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}); λ) = γ^{★} t, & for k = p + 1 . \end{matrix} \end{matrix}

(A15)

Therefore, (A4) and (A15) imply that solving the primal in (24) is equivalent to solving (20) with

μ = γ^{★}

. There exists

α^{★} > 0

such that

\begin{matrix} {({(θ^{★} (λ, t, μ))}^{⊤}, θ_{p + 1}^{★} (λ, t, μ))}^{⊤} = α^{★} {({(θ^{★ ★} (λ, t, \tilde{μ}))}^{⊤}, θ_{p + 1}^{★ ★} (λ, t, \tilde{μ}))}^{⊤} \end{matrix}

(A16)

for any

μ > 0

and

\tilde{μ} \in R

. Equations (A13) and (A16) conclude that

{\hat{β}}^{P R} (λ, t) = {\hat{\hat{β}}}^{P R} (λ, t)

. The proof is now complete.

Appendix A.3. Proof of Proposition 2

The proofs of parts (i) and (ii) are very similar, and thus, we only show part (i). Theorem 2 (i) tells us that there exists

μ^{★} > 0

such that

{({({\hat{β}}^{P R} (λ, t))}^{⊤}, 1)}^{⊤}

uniquely solves (20) with

μ = μ^{★}

, and in turn we have that

\begin{matrix} RRSS ({\hat{β}}^{P R} (λ, t), 1; λ) - μ^{★} \sum_{k = 0}^{p} log (δ_{k} {\hat{β}}_{k}^{P R} (λ, t)) \\ < RRSS ({\hat{β}}^{R R} (λ), 1; λ) - μ^{★} \sum_{k = 0}^{p} log (δ_{k} {\hat{β}}_{k}^{R R} (λ)) . \end{matrix}

Consequently,

\begin{matrix} 0 \leq RRSS ({\hat{β}}^{P R} (λ, c), 1; λ) - RRSS ({\hat{β}}^{R R} (λ), 1; λ) < μ^{★} \sum_{k = 0}^{p} log (\frac{{\hat{β}}_{k}^{P R} (λ, t)}{{\hat{β}}_{k}^{R R} (λ)}), \end{matrix}

where the first inequality is due to (4), and in turn we could conclude (27) as

μ^{★}, t \geq 0

. The proof of part (i) is now complete, which completes the entire proof.

Appendix B. Data Generation Process for Synthetic Data

This section provides a detailed description of the DGP employed in the numerical experiments presented in Section 3. The design aims to assess the performance of the various estimators under controlled levels of multicollinearity and heteroscedasticity. For reproducibility, the procedure is structured into three main steps, which are outlined below.

Step 1: Feature Generation

(i): Correlation Matrix Construction: We construct a symmetric correlation matrix $Σ \in R^{m \times m}$ for m features. The elements are defined as $Σ_{i j} = ρ^{| i - j |}$ for $i \neq j$ and $Σ_{i i} = 1$ otherwise, simulating multicollinearity. The correlation parameter $ρ$ is varied across ${- 0.75, - 0.5, 0, 0.5, 0.75}$ to assess the models under different levels of association.
(ii): Feature Vector Simulation: Feature vectors $X_{i}$ are generated from a multivariate normal distribution $N (0, Σ)$ . Each feature is subsequently standardised to ensure zero mean and unit variance for each individual feature.

Step 2: Response Variable Generation

(i): Regression Parameters: We generate the true regression parameters $β$ where each component is defined as $β_{k} = {(- 1)}^{k} ⌈ k / 2 ⌉$ for $k = 1, \dots, p$ . This introduces a diverse range of predictor effects to test the robustness of the estimators.
(ii): Response Simulation: For each observation i, we compute the linear predictor $η_{i} = X_{i}^{T} β$ . The response variable $Y_{i}$ is then simulated from a univariate Gaussian distribution $N (η_{i}, σ_{i}^{2})$ .
(iii): Heteroscedasticity and Normalisation: To incorporate heteroscedasticity, the standard error $σ_{i}$ is drawn from an absolute normal distribution with a mean of 10 and a standard deviation of 1. Finally, the response variable $Y_{i}$ is normalised to ensure a zero mean across the entire dataset.

Step 3: Experimental Scale and Repetitions

(i): Dimensions and Sample Sizes: The simulation considers varying numbers of covariates $m = p + 1 \in {2, 10, 25}$ . For each m, the sample size n is determined by specific ratios of the sample size to the number of covariates, namely $n / m \in {10, 25, 50, 100}$ .
(ii): Statistical Reliability: To ensure the statistical significance of the reported results, all quantities and performance metrics are computed based on $n = 1000$ samples and $N = 1000$ independent repetitions for each scenario.

Appendix C. Description of WTI and Brent Data

This section provides additional technical details regarding the data sources, the rationale for selecting specific oil benchmarks, the granular timeline of the structural segments used in the empirical analysis, and the overall correlation matrices illustrating the dependence structure among the covariates.

Appendix C.1. Market Selection and Data Sources

The selection of WTI and Brent is motivated by their established status as the primary global price benchmarks for crude oil. WTI serves as the underlying commodity for the New York Mercantile Exchange (NYMEX) futures and reflects North American supply–demand dynamics. Brent, traded on the Intercontinental Exchange (ICE), acts as the benchmark for roughly two-thirds of the world’s internationally traded physical crude oil. Both benchmarks are widely adopted in the commodity literature for analysing risk premiums, price discovery, and factor-based investment strategies (Bakshi et al. 2019; Sakkas and Tessaromatis 2020).

The monthly data, spanning January 1980 to September 2024, were sourced from Bloomberg. This includes front-month futures prices used to calculate excess returns, a standard proxy for commodity investment performance in empirical finance (Yang 2013), as well as various market-based and liquidity-based metrics used to construct the predictive factors (Hong and Yogo 2012). Although data for WTI has been available since 1983, in order to meet the liquidity requirements for including Brent, the starting date must be set at April 1998 to ensure that fully balanced panel data can be formed across all covariates.

Appendix C.2. Detailed Factor Background

The factors employed in our analysis are calculated based on methodologies established in the commodity investing literature, particularly those developed for identifying priced risk premia in futures markets (Sakkas and Tessaromatis 2020). These factors capture distinct dimensions of the commodity risk premium and are described below.

Inventory and Term Structure: Factors such as Basis (slope of the term structure) and Basis Momentum (change in basis) capture the “roll yield” associated with the shape of the futures curve. The Basis factor is rooted in the Theory of Storage (Working 1949) and the Hedging Pressure Hypothesis (Keynes 1930), identifying backwardation and contango as fundamental drivers of expected returns. Basis Momentum, defined as the difference between the momentum signals of the first and second nearby contracts, provides compensation for commodity volatility and curve dynamics (Boons and Prado 2019).

Trend and Sentiment: Momentum (past returns) captures the tendency of commodity returns to persist over a 12-month horizon (Miffre and Rallis 2007). Hedging Pressure (commercial vs. non-commercial positioning) and Open Interest (market liquidity) reflect the positioning of commercial hedgers relative to speculators and the overall risk absorption capacity of the market (Hong and Yogo 2012). Based on the theory of normal backwardation (Keynes 1930), these factors proxy for the risk premium demanded by speculators for providing insurance to producers (Bessembinder 1992). Due to data limitations, Hedging Pressure is omitted for Brent as the Commodity Futures Trading Commission (CFTC) reports primarily cover US-based exchanges like NYMEX.

Risk, Value, and Macro: Skewness (third moment of returns) and Volatility (return dispersion) address non-normal return distributions and “fat tails” typical of energy markets, capturing compensation for jump and variance risks (Fernandez-Perez et al. 2018). Inflation Beta (sensitivity to inflation) captures the historically documented role of commodities as a hedge against unexpected rising price levels (Gorton and Rouwenhorst 2006). Finally, the Value (spot-to-long-term-average ratio) factor identifies long-term mean reversion by comparing current prices to their five-year historical average (Asness et al. 2013).

Appendix C.3. Detailed Structural Breakdown of Testing Periods

We applied the Bai and Perron (2003) test to the monthly returns to identify the structural breakpoints that define the OOS periods, as shown in Figure A1 and Figure A2. This method ensures that the data intervals align with actual changes in market volatility and average returns. Table A1 provides an overview of the 12 periods for WTI and the 11 periods for Brent, with specific annotations for key market events, including periods of major crisis.

Figure A1. Time series of WTI crude oil performance. Notes. This figure depicts the monthly excess returns (A) and accumulated performance (B) of WTI oil from August 1988 to September 2024. The solid red vertical lines indicate the dataset boundaries. The dashed orange vertical lines depict structural breakpoints identified by the Bai and Perron (2003) test, highlighting major economic or market events used for predictive segmentation.

Figure A2. Time series of Brent crude oil performance. Notes. This figure depicts the monthly excess returns (A) and accumulated performance (B) of Brent oil from April 1998 to September 2024. The solid red vertical lines indicate the dataset boundaries. The dashed orange vertical lines represent key structural breakpoints identified by the Bai and Perron (2003) test used to define periods for predictive modelling.

Table A1. Detailed segmented periods for WTI and Brent with key market events.

	WTI Periods			Brent Periods
Period	Start	End	Key Market Event	Start	End	Key Market Event
1	August 1988	December 1990	1990 Supply Shock	April 1998	August 2000	Asian Financial Crisis
2	January 1991	October 1993	Early 90s Oversupply	September 2000	April 2003	Early 2000s Recession
3	November 1993	January 1997	Mid-90s Expansion	May 2003	August 2005	China Demand Boom
4	February 1997	August 2000	Asian Financial Crisis	September 2005	August 2008	Pre-GFC Commodity Supercycle
5	September 2000	August 2008	2000s Commodity Supercycle	September 2008	January 2011	Global Financial Crisis
6	September 2008	May 2011	Global Financial Crisis	February 2011	May 2014	Post-Crisis Recovery
7	Jun 2011	Jun 2014	Post-Crisis Recovery	Jun 2014	Mar 2016	2014–16 Price Collapse
8	Jul 2014	Mar 2016	2014–16 Price Collapse	April 2016	September 2018	OPEC+ Production Cuts
9	April 2016	September 2018	OPEC+ Production Cuts	October 2018	January 2020	US–China Trade Tension
10	October 2018	February 2020	US–China Trade Tension	February 2020	Jul 2022	COVID-19 Pandemic
11	Mar 2020	Jul 2022	COVID-19 Pandemic	August 2022	September 2024	Post-Pandemic
12	August 2022	September 2024	Post-Pandemic	–	–	–

Notes. This table lists the segmented time periods identified by the Bai and Perron (2003) structural breakpoint test for both WTI and Brent crude oil datasets. Major global economic events impacting the volatility and price structure of both benchmarks are highlighted in bold. For each OOS iteration, models are trained on all data prior to the start of the current period to evaluate predictive accuracy in the subsequent regime.

Appendix C.4. Regime Characteristics and Model Stability

The structural periods we have identified encompass a variety of volatility regimes, providing a practical approach for testing both traditional and regularised estimators. The results indicate that the best-performing model depends to a large extent on the macroeconomic environment. During periods of stability or moderate growth, such as the Mid-90s Expansion (WTI Period 3), the 2000s Commodity Supercycle (WTI Period 5) and the Pre-GFC Commodity Supercycle (Brent Period 4), OLS often produces highly competitive or even superior forecasting results. As factor correlations remain relatively stable in these calm markets, standard OLS estimators can capture the data structure without the bias introduced by shrinkage methods.

However, during periods of high market volatility, OLS typically suffers from severe variance inflation. For example, during the OPEC+ Production Cuts (WTI, Period 9) and the COVID-19 pandemic (Brent, Period 10), the breakdown of historical factor relationships led to significant OOS forecasting errors in OLS. Although traditional shrinkage models such as RR and Liu outperform OLS under these conditions, they remain relatively sensitive to sudden spikes in volatility. As these methods employ a single penalty parameter to shrink all coefficients, sudden shocks to specific factors may cause the model to overshrink important signals or produce unstable estimates. This sensitivity often undermines OOS forecasting performance, as exemplified during the Global Financial Crisis (Brent, Period 5). At that time, the RR and Liu models underperformed even OLS due to an overreaction to market noise.

On the other hand, the PR framework, especially

{PR}_{c}

, maintains predictive stability through its explicit parity constraint. By focusing on risk distribution rather than just the overall size of the parameters, PR is less exposed to the volatility spikes that often disrupt standard penalised models. Instead of simply shrinking coefficients like RR or Liu, PR ensures that no single factor takes over the model’s overall risk profile. This prevents overfitting to market noise or extreme events, such as the severe negative WTI returns in April 2020 (WTI, Period 11) or the 2014–2016 Price Collapse (Brent, Period 7). This type of regularisation is highly effective in unstable markets, as the parity algorithm keeps predictions stable even when historical correlations break down entirely.

Appendix C.5. Correlation Matrices for Real Datasets

This section provides the overall Pearson correlation matrices for the covariates used in our empirical analysis to check for multicollinearity. We calculate these matrices using the entire pooled sample period rather than individual rolling windows. Table A2 presents the

9 \times 9

correlation matrix for the WTI dataset, and Table A3 presents the

8 \times 8

correlation matrix for the Brent dataset.

Specifically, Table A2 reveals dependencies in the WTI dataset, such as a high positive correlation between Momentum (Cov 1) and Basis Momentum (Cov 3) at 0.621, and moderate negative correlations between Momentum and Value (Cov 9) at −0.436. Similarly, Table A3 also shows multicollinearity in the Brent dataset, notably the strong relationship between Momentum and Basis Momentum (0.612) and a moderate positive correlation between Inflation Beta (Cov 5) and Value (Cov 8) at 0.499. These interrelated parameter structures justify our application of regularised estimators to stabilise predictions.

Table A2. Overall correlation matrix for WTI covariates.

	Cov 1	Cov 2	Cov 3	Cov 4	Cov 5	Cov 6	Cov 7	Cov 8	Cov 9
Cov 1	1.000	−0.398	0.621	−0.009	−0.047	−0.048	0.057	−0.032	−0.436
Cov 2	−0.398	1.000	−0.378	0.229	0.200	−0.040	0.072	−0.010	0.191
Cov 3	0.621	−0.378	1.000	0.072	0.038	0.060	0.150	−0.037	−0.114
Cov 4	−0.009	0.229	0.072	1.000	−0.197	0.058	−0.052	0.005	0.098
Cov 5	−0.047	0.200	0.038	−0.197	1.000	0.026	0.195	−0.005	0.107
Cov 6	−0.048	−0.040	0.060	0.058	0.026	1.000	−0.047	−0.016	0.056
Cov 7	0.057	0.072	0.150	−0.052	0.195	−0.047	1.000	−0.015	0.250
Cov 8	−0.032	−0.010	−0.037	0.005	−0.005	−0.016	−0.015	1.000	−0.001
Cov 9	−0.436	0.191	−0.114	0.098	0.107	0.056	0.250	−0.001	1.000

Notes. The table presents the Pearson correlation matrix for the nine covariates used in the WTI dataset across the entire pooled sample period. The covariates are defined as follows: Cov 1 = Momentum, Cov 2 = Basis, Cov 3 = Basis Momentum, Cov 4 = Skewness, Cov 5 = Inflation Beta, Cov 6 = Volatility, Cov 7 = Hedging Pressure, Cov 8 = Open Interest, Cov 9 = Value.

Table A3. Overall correlation matrix for Brent covariates.

	Cov 1	Cov 2	Cov 3	Cov 4	Cov 5	Cov 6	Cov 7	Cov 8
Cov 1	1.000	−0.434	0.612	−0.107	−0.084	0.024	0.004	−0.431
Cov 2	−0.434	1.000	−0.476	0.204	0.093	0.023	0.025	0.210
Cov 3	0.612	−0.476	1.000	0.020	0.068	−0.002	0.033	−0.083
Cov 4	−0.107	0.204	0.020	1.000	−0.175	−0.014	−0.004	0.195
Cov 5	−0.084	0.093	0.068	−0.175	1.000	−0.011	−0.005	0.499
Cov 6	0.024	0.023	−0.002	−0.014	−0.011	1.000	−0.032	−0.010
Cov 7	0.004	0.025	0.033	−0.004	−0.005	−0.032	1.000	−0.009
Cov 8	−0.431	0.210	−0.083	0.195	0.499	−0.010	−0.009	1.000

Notes. The table presents the Pearson correlation matrix for the eight covariates used in the Brent dataset across the entire pooled sample period. The covariates are defined as follows: Cov 1 = Momentum, Cov 2 = Basis, Cov 3 = Basis Momentum, Cov 4 = Skewness, Cov 5 = Inflation Beta, Cov 6 = Volatility, Cov 7 = Open Interest, Cov 8 = Value.

Note

1	Available at https://cran.r-project.org/web/packages/savvyPR/index.html (accessed on 17 March 2026).

References

Asimit, Alexandru V., Edward Furman, Qihe Tang, and Raluca Vernic. 2011. Asymptotics for risk capital allocations based on conditional tail expectation. Insurance: Mathematics and Economics 49: 310–24. [Google Scholar] [CrossRef]
Asimit, Alexandru V., Raluca Vernic, and Ričardas Zitikis. 2013. Evaluating risk measures and capital allocations based on multi-losses driven by a heavy-tailed background risk: The multivariate pareto-ii model. Risks 1: 14–33. [Google Scholar] [CrossRef]
Asimit, Vali, Liang Peng, Radu Tunaru, and Feng Zhou. Forthcoming-a. Risk Budgeting Under General Risk Measures. Available online: https://openaccess.city.ac.uk/id/eprint/33733/ (accessed on 5 March 2026).
Asimit, Vali, Liang Peng, Ruodu Wang, and Alex Yu. 2019. An efficient approach to quantile capital allocation and sensitivity analysis. Mathematical Finance 29: 1131–56. [Google Scholar] [CrossRef]
Asimit, Vali, Marina Anca Cidota, Ziwei Chen, and Jennifer Asimit. Forthcoming-b. Slab and Shrinkage Linear Regression Estimation. Available online: https://openaccess.city.ac.uk/id/eprint/35005/ (accessed on 5 March 2026).
Asimit, Vali, Wing Fung Chong, Radu Tunaru, and Feng Zhou. 2025. Portfolio selection and risk sharing via risk budgeting. Insurance: Mathematics and Economics 125: 103139. [Google Scholar] [CrossRef]
Asimit, Vali, Ziwei Chen, and Nathan Lassance. Forthcoming-c. Distribution-free shrinkage of high-dimensional mean vector. Journal of Business & Economic Statistics. [Google Scholar]
Asness, Clifford S., Tobias J. Moskowitz, and Lasse Heje Pedersen. 2013. Value and momentum everywhere. The Journal of Finance 68: 929–85. [Google Scholar] [CrossRef]
Bai, Jushan, and Pierre Perron. 2003. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18: 1–22. [Google Scholar] [CrossRef]
Bakshi, Gurdip, Xiaohui Gao, and Alberto G. Rossi. 2019. Understanding the sources of risk underlying the cross section of commodity returns. Management Science 65: 459–954. [Google Scholar] [CrossRef]
Bessembinder, Hendrik. 1992. Systematic risk, hedging pressure, and risk premiums in futures markets. The Review of Financial Studies 5: 637–67. [Google Scholar] [CrossRef]
Bodnar, Olha, Taras Bodnar, and Nestor Parolya. 2022. Recent advances in shrinkage-based high-dimensional inference. Journal of Multivariate Analysis 188: 104826. [Google Scholar] [CrossRef]
Boons, Martijn, and Melissa Porras Prado. 2019. Basis momentum. The Journal of Finance 74: 239–79. [Google Scholar] [CrossRef]
Chen, Shaobing Scott, and David L. Donoho. 1994. Basis pursuit. Paper presented at the 1994 28th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, October 31–November 2, vol. 1, pp. 41–44. [Google Scholar]
Fernandez-Perez, Adrian, Bart Frijns, Ana-Maria Fuertes, and Joelle Miffre. 2018. The skewness of commodity futures returns. Journal of Banking & Finance 86: 143–58. [Google Scholar] [CrossRef]
Gauss, Carl Friedrich. 1821. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Göttingen: Henricus Dieterich. [Google Scholar]
Gorton, Gary, and K. Geert Rouwenhorst. 2006. Facts and fantasies about commodity futures. Financial Analysts Journal 62: 47–68. [Google Scholar] [CrossRef]
Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Berlin/Heidelberg: Springer, vol. 2. [Google Scholar]
Hoerl, Arthur E., and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. [Google Scholar] [CrossRef]
Hong, Harrison, and Motohiro Yogo. 2012. What does futures market interest tell us about the macroeconomy and asset prices? Journal of Financial Economics 105: 473–90. [Google Scholar] [CrossRef]
James, William, and Charles Stein. 1961. Estimation with quadratic loss. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Oakland: University of California Press, vol. 1, pp. 361–79. [Google Scholar]
Keynes, Josiah Charles. 1930. A Treatise on Money. London: Macmillan. [Google Scholar]
Liu, Kejian. 1993. A new class of blased estimate in linear regression. Communications in Statistics-Theory and Methods 22: 393–402. [Google Scholar]
Liu, Kejian. 2003. Using liu-type estimator to combat collinearity. Communications in Statistics-Theory and Methods 32: 1009–20. [Google Scholar] [CrossRef]
Markov, Andreĭ Andreevich. 1912. Wahrscheinlichkeitsrechnung. Leipzig: B. G. Teubner. [Google Scholar]
Miffre, Joëlle, and Georgios Rallis. 2007. Momentum strategies in commodity futures markets. Journal of Banking & Finance 31: 1863–86. [Google Scholar] [CrossRef]
Sakkas, Athanasios, and Nikolaos Tessaromatis. 2020. Factor based commodity investing. Journal of Banking & Finance 115: 105782. [Google Scholar] [CrossRef]
Seber, George A. F., and Alan J. Lee. 2003. Linear Regression Analysis, 2nd ed. Hoboken: John Wiley & Sons. [Google Scholar]
Stein, Charles. 1956. Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. Oakland: University of California Press, vol. 1, pp. 197–206. [Google Scholar]
Tasche, D. Risk contributions and performance measurement. Working Paper, Lehrstuhl für Mathematische Statistik, TU München, 1999. Available online: https://www.financerisks.com/filedati/WP/CAPITAL%20ALLOCATION/RISK%20PERFORMANCE%20MEASUREMENT.pdf (accessed on 5 March 2026).
Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58: 267–88. [Google Scholar] [CrossRef]
Tikhonov, Andrey Nikolayevich. 1943. On the stability of inverse problems. Doklady Akademii Nauk SSSR 39: 195–98. [Google Scholar]
Working, Holbrook. 1949. The theory of price of storage. The American Economic Review 39: 1254–62. [Google Scholar]
Yang, Fan. 2013. Investment shocks and the commodity basis spread. Journal of Financial Economics 110: 164–84. [Google Scholar] [CrossRef]

Figure 1. Geometric interpretation of PR estimation in

K_{2} (1, 1)

for

p = 1

,

λ = 0

, and an appropriately chosen

μ^{★} > 0

.

Figure 1. Geometric interpretation of PR estimation in

K_{2} (1, 1)

for

p = 1

,

λ = 0

, and an appropriately chosen

μ^{★} > 0

.

Table 1. Simulation study results for different numbers of covariates.

	$n / m = 10$					$n / m = 25$					$n / m = 50$					$n / m = 100$
$ρ$	−0.75	−0.5	0	0.5	0.75	−0.75	−0.5	0	0.5	0.75	−0.75	−0.5	0	0.5	0.75	−0.75	−0.5	0	0.5	0.75
Panel A: $m = 2$
OLS	4.15	3.23	2.82	3.23	4.15	2.66	2.07	1.80	2.07	2.66	1.81	1.43	1.27	1.43	1.81	1.31	1.02	0.90	1.02	1.31
OLS	(2.84)	(1.96)	(1.53)	(1.96)	(2.84)	(1.66)	(1.17)	(0.94)	(1.17)	(1.66)	(1.13)	(0.80)	(0.67)	(0.80)	(1.13)	(0.79)	(0.56)	(0.46)	(0.56)	(0.79)
RR	2.03	1.78	1.70	1.91	2.23	1.47	1.34	1.36	1.55	1.77	1.14	1.09	1.17	1.35	1.52	0.87	0.85	0.93	1.13	1.34
RR	(2.26)	(1.49)	(1.06)	(1.41)	(2.14)	(1.28)	(0.83)	(0.58)	(0.68)	(1.06)	(0.83)	(0.57)	(0.46)	(0.44)	(0.58)	(0.57)	(0.43)	(0.39)	(0.41)	(0.41)
Liu	2.02	1.79	1.73	1.95	2.29	1.38	1.24	1.25	1.45	1.72	1.08	1.01	1.06	1.22	1.41	0.91	0.83	0.85	1.00	1.18
Liu	(2.10)	(1.45)	(1.13)	(1.39)	(1.96)	(1.28)	(0.89)	(0.69)	(0.80)	(1.14)	(0.82)	(0.59)	(0.49)	(0.52)	(0.68)	(0.55)	(0.40)	(0.36)	(0.41)	(0.48)
${PR}_{c}$	2.17	1.89	1.77	1.94	2.25	1.58	1.40	1.36	1.51	1.75	1.17	1.08	1.13	1.31	1.49	0.85	0.82	0.91	1.13	1.34
${PR}_{c}$	(2.21)	(1.47)	(1.11)	(1.43)	(2.11)	(1.28)	(0.86)	(0.65)	(0.79)	(1.12)	(0.84)	(0.62)	(0.53)	(0.51)	(0.66)	(0.56)	(0.45)	(0.41)	(0.41)	(0.40)
${PR}_{t}$	2.30	1.93	1.77	1.92	2.24	1.75	1.46	1.35	1.47	1.72	1.36	1.13	1.09	1.20	1.40	1.05	0.83	0.77	0.92	1.13
${PR}_{t}$	(2.20)	(1.48)	(1.12)	(1.46)	(2.13)	(1.30)	(0.90)	(0.67)	(0.82)	(1.17)	(0.96)	(0.71)	(0.58)	(0.61)	(0.78)	(0.86)	(0.62)	(0.50)	(0.52)	(0.61)
Panel B: $m = 10$
OLS	5.83	4.06	3.24	4.05	5.82	3.60	2.51	1.99	2.51	3.60	2.56	1.78	1.42	1.78	2.56	1.79	1.25	0.99	1.25	1.79
OLS	(1.64)	(1.07)	(0.77)	(1.06)	(1.63)	(1.01)	(0.65)	(0.46)	(0.66)	(1.01)	(0.72)	(0.47)	(0.33)	(0.47)	(0.72)	(0.50)	(0.33)	(0.23)	(0.33)	(0.50)
RR	3.15	3.09	3.15	4.23	6.38	2.35	2.16	1.98	2.54	3.70	1.88	1.63	1.41	1.80	2.60	1.44	1.18	0.99	1.25	1.81
RR	(1.03)	(0.79)	(0.75)	(1.24)	(2.05)	(0.71)	(0.54)	(0.46)	(0.69)	(1.10)	(0.55)	(0.42)	(0.33)	(0.47)	(0.75)	(0.40)	(0.30)	(0.23)	(0.33)	(0.52)
Liu	4.27	3.49	3.16	4.16	5.87	3.17	2.36	1.98	2.55	3.69	2.40	1.73	1.41	1.80	2.60	1.74	1.23	0.99	1.26	1.81
Liu	(1.28)	(0.89)	(0.76)	(1.17)	(1.69)	(0.89)	(0.61)	(0.45)	(0.70)	(1.09)	(0.67)	(0.45)	(0.33)	(0.48)	(0.76)	(0.48)	(0.32)	(0.23)	(0.34)	(0.53)
${PR}_{c}$	3.44	3.15	3.12	4.20	6.31	2.41	2.09	1.95	2.55	3.72	1.83	1.54	1.38	1.81	2.62	1.35	1.11	0.99	1.25	1.80
${PR}_{c}$	(1.19)	(0.88)	(0.76)	(1.24)	(2.04)	(0.81)	(0.58)	(0.46)	(0.68)	(1.08)	(0.61)	(0.42)	(0.33)	(0.48)	(0.75)	(0.40)	(0.29)	(0.23)	(0.33)	(0.52)
${PR}_{t}$	4.19	3.26	3.10	4.21	6.18	2.68	2.07	1.99	2.78	4.01	1.93	1.52	1.51	2.19	3.20	1.33	1.10	1.22	1.90	2.76
${PR}_{t}$	(1.59)	(1.06)	(0.80)	(1.28)	(1.95)	(1.21)	(0.75)	(0.51)	(0.69)	(1.12)	(0.95)	(0.56)	(0.34)	(0.48)	(0.72)	(0.62)	(0.29)	(0.21)	(0.31)	(0.42)
Panel C: $m = 25$
OLS	6.17	4.25	3.31	4.25	6.18	3.75	2.58	2.03	2.58	3.75	2.63	1.81	1.42	1.81	2.63	1.85	1.27	1.00	1.27	1.85
OLS	(1.12)	(0.73)	(0.49)	(0.73)	(1.12)	(0.64)	(0.42)	(0.29)	(0.42)	(0.64)	(0.45)	(0.30)	(0.20)	(0.30)	(0.45)	(0.32)	(0.21)	(0.14)	(0.21)	(0.32)
RR	4.66	3.99	3.31	4.26	6.18	3.22	2.52	2.03	2.59	3.75	2.41	1.79	1.42	1.81	2.63	1.75	1.27	0.99	1.27	1.85
RR	(0.85)	(0.66)	(0.50)	(0.73)	(1.15)	(0.55)	(0.40)	(0.29)	(0.41)	(0.65)	(0.41)	(0.29)	(0.20)	(0.30)	(0.45)	(0.30)	(0.21)	(0.14)	(0.21)	(0.32)
Liu	5.97	4.19	3.31	4.26	6.19	3.70	2.58	2.03	2.60	3.75	2.63	1.81	1.42	1.81	2.66	1.85	1.27	1.00	1.27	1.85
Liu	(1.08)	(0.71)	(0.50)	(0.74)	(1.14)	(0.63)	(0.41)	(0.29)	(0.42)	(0.64)	(0.45)	(0.29)	(0.20)	(0.30)	(0.46)	(0.32)	(0.21)	(0.14)	(0.21)	(0.32)
${PR}_{c}$	4.77	3.89	3.31	4.29	6.20	3.08	2.43	2.02	2.59	3.78	2.27	1.74	1.41	1.82	2.64	1.68	1.24	1.00	1.27	1.84
${PR}_{c}$	(1.06)	(0.69)	(0.50)	(0.73)	(1.14)	(0.59)	(0.40)	(0.29)	(0.42)	(0.65)	(0.41)	(0.28)	(0.20)	(0.30)	(0.46)	(0.29)	(0.20)	(0.14)	(0.21)	(0.32)
${PR}_{t}$	5.16	3.96	3.43	4.76	7.16	3.22	2.43	2.23	3.43	5.45	2.30	1.75	1.73	2.96	4.89	1.66	1.31	1.44	2.70	4.60
${PR}_{t}$	(1.49)	(0.85)	(0.54)	(0.78)	(1.13)	(0.94)	(0.47)	(0.32)	(0.46)	(0.67)	(0.61)	(0.29)	(0.22)	(0.36)	(0.49)	(0.31)	(0.18)	(0.16)	(0.27)	(0.34)

Notes. This table reports the average

L_{2}

-distances, as defined in (30), with the corresponding standard deviation (in brackets) defined in (31), based on

N = 1000

independent repetitions with sample size

n = 1000

. We compare OLS, RR, Liu, and the two PR variants (

{PR}_{c}

,

{PR}_{t}

) across correlation parameters

ρ \in {- 0.75, - 0.5, 0, 0.5, 0.75}

and ratios

n / m \in {10, 25, 50, 100}

. Results are organised into Panel A (

m = 2

), Panel B (

m = 10

), and Panel C (

m = 25

). Red values indicate the lowest average L₂-distance, while underlined values denote the second-best performing estimator in each scenario.

Table 2. Summary of simulation study.

	Panel A ( $m = 2$ )		Panel B ( $m = 10$ )		Panel C ( $m = 25$ )		Total (All Panels)
Model	Best	2nd	Best	2nd	Best	2nd	Best	2nd
OLS	0	0	9	0	8	5	17	5
RR	4	5	3	6	2	4	9	15
Liu	9	8	0	4	0	1	9	13
${PR}_{c}$	2	2	3	9	9	5	14	16
${PR}_{t}$	5	5	5	1	1	5	11	11
PR (Combined)	7	7	8	10	10	10	25	27
Total Scenarios	20	20	20	20	20	20	60	60

Notes. This table summarises the performance of all estimators across 60 independent scenarios, based on the results reported in Table 1. “Best” and “2nd” denote the number of times an estimator achieved the lowest and second-lowest average

L_{2}

-distance, respectively.

Table 3. Performance comparison for WTI monthly returns.

Predicted	RMSE (%)					MAE (%)
Period	OLS	RR	Liu	${PR}_{c}$	${PR}_{t}$	OLS	RR	Liu	${PR}_{c}$	${PR}_{t}$
2	79.89	25.85	32.63	24.99	26.49	72.22	21.31	26.33	22.08	24.16
3	19.48	19.76	19.67	19.53	19.57	16.09	16.62	16.41	16.42	16.47
4	37.40	33.03	34.74	33.86	34.92	29.14	26.30	27.49	26.74	27.52
5	10.31	14.82	13.26	10.95	14.89	8.63	12.38	11.23	9.23	12.45
6	38.94	92.96	91.21	79.65	90.42	31.47	82.77	81.57	69.84	80.21
7	28.04	12.04	15.13	14.20	11.27	23.96	9.52	13.10	11.86	8.81
8	30.73	12.24	11.88	11.57	12.10	26.95	9.42	9.24	9.01	9.32
9	138.28	9.26	42.43	8.71	131.42	52.93	7.75	22.09	6.99	51.41
10	31.80	16.58	15.47	16.62	20.21	23.77	13.33	12.84	13.36	15.48
11	67.56	30.65	33.66	28.51	29.00	57.45	20.92	23.95	20.40	20.32
12	51.40	43.35	43.62	43.35	43.35	47.98	42.10	42.43	42.10	42.10
Best Count	3	2	1	4	1	3	3	1	2	2
2nd Best	0	4	1	5	1	0	3	2	6	0

Notes. The table presents the RMSE and MAE (in percentage points) defined in (33) for OLS, RR, Liu,

{PR}_{c}

and

{PR}_{t}

over 11 OOS predicted periods. For each period, the model with the best performance is highlighted in red, while the second-best is underlined. The bottom two rows summarise the total count of best and second-best performances achieved by each estimator across all periods.

Table 4. Performance comparison for Brent monthly returns.

Predicted	RMSE (%)					MAE (%)
Period	OLS	RR	Liu	${PR}_{c}$	${PR}_{t}$	OLS	RR	Liu	${PR}_{c}$	${PR}_{t}$
2	36.14	30.13	28.91	30.45	30.03	31.92	26.69	25.47	27.01	26.60
3	13.79	14.24	13.99	14.17	14.20	11.16	11.99	11.72	11.81	11.88
4	8.62	10.63	8.80	10.11	11.02	6.89	8.69	6.90	8.09	8.68
5	83.72	123.75	122.31	66.33	138.25	80.43	111.96	110.77	63.22	122.86
6	60.75	22.98	36.27	26.13	28.84	29.01	13.74	18.64	14.80	15.67
7	11.69	12.48	11.69	10.92	11.05	8.99	10.72	9.76	8.90	9.10
8	40.52	23.52	16.66	22.62	21.02	33.75	19.18	13.68	18.44	17.18
9	21.14	15.72	15.80	15.62	16.24	18.09	13.82	13.87	13.70	14.41
10	163.04	61.51	72.91	28.44	127.71	110.94	40.70	49.40	25.62	81.05
11	68.70	42.69	48.40	42.71	42.71	67.32	41.45	47.32	41.48	41.48
Best Count	2	2	2	4	0	2	2	2	4	0
2nd Best	1	2	2	2	4	2	2	2	2	3

Notes. The table presents the RMSE and MAE (in percentage points) defined in (33) for OLS, RR, Liu,

{PR}_{c}

and

{PR}_{t}

over 10 OOS predicted periods. For each period, the model with the best performance is highlighted in red, while the second-best is underlined. The bottom two rows summarise the total count of best and second-best performances achieved by each estimator across all periods.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Asimit, V.; Chen, Z.; Ichim, B.; Millossovich, P. Parity Regression Estimation. Risks 2026, 14, 94. https://doi.org/10.3390/risks14040094

AMA Style

Asimit V, Chen Z, Ichim B, Millossovich P. Parity Regression Estimation. Risks. 2026; 14(4):94. https://doi.org/10.3390/risks14040094

Chicago/Turabian Style

Asimit, Vali, Ziwei Chen, Bogdan Ichim, and Pietro Millossovich. 2026. "Parity Regression Estimation" Risks 14, no. 4: 94. https://doi.org/10.3390/risks14040094

APA Style

Asimit, V., Chen, Z., Ichim, B., & Millossovich, P. (2026). Parity Regression Estimation. Risks, 14(4), 94. https://doi.org/10.3390/risks14040094

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parity Regression Estimation †

Abstract

1. Introduction

Literature Review

2. Main Results

2.1. Parity Estimation

2.2. Parity Estimation for Linear Regression

2.3. Parity Estimation and Regression—Further Explainability

3. Simulation Study

3.1. Experimental Setup and Methodology

3.2. Discussion of Simulation Results

4. Real Data Analysis

4.1. Background and Dataset Description

4.2. Data Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Proposition 2

Appendix B. Data Generation Process for Synthetic Data

Appendix C. Description of WTI and Brent Data

Appendix C.1. Market Selection and Data Sources

Appendix C.2. Detailed Factor Background

Appendix C.3. Detailed Structural Breakdown of Testing Periods

Appendix C.4. Regime Characteristics and Model Stability

Appendix C.5. Correlation Matrices for Real Datasets

Note

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Parity Regression Estimation^†