Minimum Residual Sum of Squares Estimation Method for High-Dimensional Partial Correlation Coefﬁcient

: The partial correlation coefﬁcient (Pcor) is a vital statistical tool employed across various scientiﬁc domains to decipher intricate relationships and reveal inherent mechanisms. However, existing methods for estimating Pcor often overlook its accurate calculation. In response, this paper introduces a minimum residual sum of squares Pcor estimation method (MRSS), a high-precision approach tailored for high-dimensional scenarios. Notably, the MRSS algorithm reduces the estimation bias encountered with positive Pcor. Through simulations on high-dimensional data, encompassing both sparse and non-sparse conditions, MRSS consistently mitigates the arithmetic bias for positive Pcors, surpassing other algorithms discussed. For instance, for large sample sizes ( n ≥ 100) with Pcor > 0, the MRSS algorithm reduces the MSE and RMSE by about 30–70% compared to other algorithms. The robustness and stability of the MRSS algorithm is demonstrated by the sensitivity analysis with variance and sparsity parameters. Stocks data in China’s A-share market are employed to showcase the MRSS methodology’s practicality


Introduction
The partial correlation coefficient (Pcor) measures the correlation between two random variables, X and Y, after accounting for the effects of controlling variables Z, denoted by ρ XY|Z .The Pcor essentially quantifies the unique relationship between X and Y, after removing the correlations between X and Z, and between Y and Z [1].This correlation coefficient provides a more thorough comprehension of the connection between variables, untainted by the influence of confounding factors.Unlike the Pearson correlation coefficient, which only captures the direct correlation between random variables, the Pcor enables the identification of whether correlations stem from intermediary variables.This distinction enhances the precision and validity of statistical analyses.
The Pcor is a fundamental statistical tool for investigating intricate relationships and gaining a more profound comprehension of the underlying mechanisms in a variety of scientific fields, such as psychology, biology, economics, and social sciences.When examining genetic markers and illness outcomes, biologists used the Pcor to identify correlations while accounting for potential confounding factors [2][3][4].Marrelec et al. utilised the partial correlation matrix to explore large-scale functional brain networks through functional MRI [5].In the field of economics, Pcor assists in comprehending complex connections, including the interplay between interest rates and inflation, while considering other variables' influence [6].The financial industry also employs Pcor to interpret connections and relationships between stocks in the financial markets [7,8].For example, Michis proposed a wavelet procedure for estimating Pcor between stock market returns over different time scales and implemented it for portfolio diversification [9].Using partial correlations within a complex network framework, Singh et al. examined the degree of globalisation and regionalisation of stock market linkages and how these linkages vary across different economic or market cycles [10].Meanwhile, the employment of the Gaussian graphical model (GGM) technique in psychology has recently gained popularity for defining the relationships between observed variables.This technique employs Pcors to represent pairwise interdependencies, controlling the influence of all other variables [11][12][13].
In the field of geography, a correlation analysis based on the Pcor of the fractal dimension of the variations of HZD components is implemented to study the geomagnetic field component variations in Russian [14].
Several methodologies have been proposed over the years to estimate the Pcor in statistical analyses.For instance, Peng et al. introduced a Pcor estimation technique that relies on the sparsity property of the partial correlation matrix and utilises sparse regression methods [3].Khare et al. suggested a high-dimensional graphical model selection approach based on the use of pseudolikelihood [15].Kim provided an R package "ppcor" for a fast calculation to semi-Pcor [16].Huang et al. introduced the kernel partial correlation coefficient as a measure of the conditional dependence between two random variables in various topological spaces [17].Van Aert and Goos focused on calculating the sampling variance of Pcor [18].Hu and Qiu proposed a statistical inference procedure for Pcor under the high-dimensional nonparanormal model [19].However, these methods mainly centre around determining whether or not the partial correlation coefficient is zero, without adequate regard for the precision of the Pcor calculation and the algorithm's efficacy.We analysed multiple high-dimensional algorithms and discovered notable Pcor estimation biases, particularly for positive Pcor.Even with larger sample sizes, these biases persisted.Motivated by these findings, our primary goal is to put forward a Pcor estimation algorithm to increase the precision of the Pcor estimation algorithm and diminish the estimation bias for positive Pcor values.
This paper reviews current methods for estimating Pcor in high-dimensional data.We introduce a novel minimum residual sum of squares (MRSS) Pcor estimation method under high-dimensional conditions, aiming to mitigate the estimation bias for positive Pcor.The algorithm's effectiveness is validated through simulation studies under sparse and non-sparse conditions and real data analysis on stock markets.
The sections are structured as follows: Section 2 outlines definitions and corresponding formulae for calculating Pcor, and examines common algorithms for estimating Pcor.Section 3 presents our Minimum Residual Sum of Squares Pcor estimation, designed to mitigate estimation bias for positive Pcor.In Section 4, we demonstrate the effectiveness of our proposed algorithm through simulation studies on high-dimensional data under both sparse and non-sparse conditions.Section 5 provides an analysis of real data related to stock markets, while Section 6 contains the conclusion.

Estimation for Partial Correlation Coefficient 2.1. Definition of Pcor
The classical definition of the partial correlation coefficient is defined by the correlation coefficient between the regression residuals from the linear models of two variables with the controlling variable, respectively.Let X and Y be two random variables, and Z = [Z 1 , Z 2 , • • • , Z p ] be p-dimensional controlling variables.Consider the linear regression models of X and Y, respectively, with the controlling variable Z, where ε and ζ are error terms.The partial correlation coefficient between X and Y is conditional on Z, denoted by ρ XY|Z , and defined by the correlation coefficient between the regression residuals ε and ζ, as follows where cor(., .) is the correlation coefficient of two random variables; cov(., .) is the covariance of two random variables; var(.) is the variance of a random variable.Let the sample size be n.In conventional low-dimensional cases (p < n), the ordinary least squares (OLS) is used to compute the residuals ε and ζ.Subsequently, the Pcor is computed from the correlation coefficient of residuals.However, the OLS method is not practical for highdimensional cases (p > n).Regularisation methods are introduced to deal with such cases later.

Based on Concentration Matrix
The concentration matrix can also be used to calculate Pcor.Let Z] and Σ = cov(U) be the covariance matrix.When assuming that Σ is a non-singular matrix, the concentration matrix is denoted as Consider the following linear regression, where b is the estimator of b and Û1 is the estimator of U 1 .The regression residual ê ∼ N(0, V) is independent of Û1 .The covariance matrix of ê can be computed by According to the definition in Equation (1) and −ω 21 ω 11 , the partial correlation coefficient can be computed by (2)

Based on Additional Regression Models
Additional linear regression models are introduced to calculate the Pcor.Consider new linear regression models of X with [Y, Z] and Y with [X, Z], respectively, where η and τ are regression error terms.Peng et al. [3] established the correlation between the aforementioned regression coefficients and Pcor, while verifying that formulas ω 22 , var(η) = 1 ω 11 and var(τ) = 1 ω 22 hold.Then, we derive λ 0 γ 0 = ρ 2 XY|Z .Thus, the partial correlation coefficient between X and Y can be calculated by the formula below, = sign(λ 0 ) λ 0 γ 0 , (6) where sign(•) is the sign function.
Consider linear regression models of Y with [1, Z] and Y with [1, X, Z], respectively, where ζ and τ are error terms.The partial correlation coefficient can also be calculated as follows [20] Here, we present five distinct formulae, (1), ( 2), ( 5), (6), and (7), for calculating Pcor based on diverse regression models.Specific algorithms applicable to high-dimensional scenarios will be presented in the following section.

Regularisation Regression for High-Dimensional Cases
Suppose we have centralised samples {x j , y j , z j1 , . . ., We consider matrix-type linear regression models as follows, where In high-dimensional (p > n) situations, the penalty function and regularisation regression methods can be introduced to estimate the regression coefficients for regression models.Regularisation regression methods address overfitting in statistical modelling by adding a penalty to the loss function, constraining the coefficient magnitudes.Let p λ (β) be the penalty function with a tuning parameter λ, for example, the regularisation estimate of model ( 8) is given by where the penalty p λ (α) could widely choose the Lasso penalty [21], the Ridge penalty [22], the SCAD penalty [23], the Elastic net [24], the Fused lasso [25], the MCP penalty [26], and other penalty functions.In this paper, the Lasso regularisation with penalty as p λ (α) = λ||α|| 1 is implemented by the R-package "glmnet" [27], and the MCP with penalty as p λ (α) = 1 t (tλ − α) + , (t > 1) is implemented by the R-package "ncvreg".

Existing Pcor Estimation Algorithms
To investigate high-dimensional Pcor estimation methods, we present some existing methods that are suitable for both sparse and non-sparse conditions.Combining the advantages and disadvantages of these methods, we propose a new high-dimensional Pcor estimation method: MRSS-minimum residual sum of squares partial correlation coefficient estimation algorithm.

Res Algorithm
The Res algorithm is primarily defined by the Pcor definition.This algorithm is implemented as follows.First, we use the regularisation regression (Lasso and MCP) on linear models (8) and ( 9) to obtain the estimated regression coefficients α and β; then calculate estimated residuals ε = X − X and ζ = Y − Ŷ, with X = Zα and Ŷ = Z β; at last, estimate Pcor ρres by formula (10).

Reg2 Algorithm
The Reg2 algorithm can more effectively remove the influence of Z in X and Y using the new regressions below.Consider new linear regression models as follows where η 1 and τ 1 are error terms, the estimators X = ∑ p i=1 αi Z i and Ŷ = ∑ p i=1 βi Z i are estimated by the Lasso or MCP regularisation regressions of models ( 8) and (9).Then, we implement the ordinary least squares (OLS) on models (11) and (12), and denote new estimators of X and Y by XReg2 and ŶReg2 .Computing new residuals η1 = X − XReg2 and τ1 = Y − ŶReg2 , we finally estimate Pcor by the Reg2 algorithm as ρreg2 = cor( η1 , τ1 ).

Coef and Var Algorithm
The Coef and Var algorithm is generated through the introduction of novel regression coefficients based on the Pcor definition formula ( 5) and (6).Consider linear regression models as follows where η 2 and τ 2 are error terms.Then, we implement MCP regularisation on these models ( 13) and ( 14) and obtain estimated first-term regression coefficients λ0 , γ0 and the estimated variance var( η2 ), var( τ2 ).Finally, we can obtain the Pcor estimate by Coef algorithm as ρcoe f = sign( λ0 ) λ0 γ0 and the Pcor estimate by Var algorithm as ρvar = λ0 var( τ2 )/var( η2 ).

Motivation
From the comprehensive simulations in this paper, it is evident that the Pcor estimation methods discussed exhibit significant bias.This bias becomes more pronounced as the true Pcor increases, especially when the Pcor is positive.Therefore, further research is necessary to address this estimation bias in positive Pcor scenarios.While each algorithm has its merits, the Reg2 algorithm performs notably well when Pcor is below approximately 0.5.In contrast, the Coef and Var algorithm stands out with minimal bias when Pcor exceeds roughly 0.5.Our goal is to develop a method that synergises the strengths of both the Reg2 and Var algorithms.
The models introduced in the Reg2 algorithm, (11) and (12), can be represented as, When compared to models ( 13) and ( 14) from the Coef and Var algorithm, it is evident that the residuals η 1 and η 2 share commonalities.Both provide insights into the information in X after the exclusion of Y and Z effects in some sense.Similarly, τ 1 and τ 2 capture the essence of Y after removing for X and Z influences.If we choose a η i and τ i with a smaller residual sum of squares, then this will lead to a better estimation for the corresponding regression models.A reduced residual sum of squares in the corresponding regression models signifies enhanced precision in eliminating controlling variables effects, leading to a more accurate Pcor estimator.Guided by the objective of minimising the residual sum of squares, we introduce a novel algorithm for high-dimensional Pcor estimation in the subsequent subsection.

MRSS Algorithm and Its Implementation
We propose a novel Minimum Residual Sum of Squares partial correlation coefficient estimation algorithm, denoted by MRSS.This algorithm aims to diminish the estimation bias for positive Pcor values under high-dimensional situations.Our MRSS algorithm amalgamates the strengths of the Reg2, Coef, and Var algorithms, effectively curtailing bias in Pcor estimation.
Define RSSX = η k 2 2 and RSSY = τ k 2 2 as the residual sum of squares of X after removing the effects of X and Z, and the residual sum of squares of Y after removing the effects of X and Z, respectively.The tuning parameter k is chosen by minimising the sum of squares of the residuals, so as to remove more associated effects and ensure a more efficient Pcor estimator.For k = 1, the pair (η 1 , τ 1 ) represents the residuals from the Reg2 algorithm' models (11) and (12).For k = 2, (η 2 , τ 2 ) corresponds to the residuals from the Coef and Var algorithms' models ( 13) and (14).Then, the residuals estimated by the MRSS algorithm satisfy the minimum residual sum of squares of both X and Y for a more efficient Pcor estimator as follows The Pcor estimated by MRSS is then given by where I is the indicator function and λ 0 is the primary regression coefficient in model (13).
If k = 1, then ρ mrss is estimated following the idea of Reg2 algorithm; if k = 2, then ρ mrss is estimated following the idea of the Coef and Var algorithm.If the two k estimates in (17) differ, the more stable Reg2 algorithm is preferred, setting k = 1 in (18).Given that MRSS integrates two existing algorithms, its convergence should align with their rates.
During the implementation of the MRSS algorithm (Algorithm 1), the Coef and Var algorithm often misestimates Pcor as 0 or ±1 when the true Pcor is close to 0 or ±1, affecting the algorithms' precision.To address this, we incorporate a discriminative condition in the MRSS pseudo-code.If the estimated Pcor ρcoe f or ρvar is zero or ±1, the Coef and Var algorithm is deemed unreliable, and the Reg2 algorithm's estimate is adopted.

Algorithm 1: MRSS algorithm
Data: (X, Y, Z) with the dimension (n, p) Result: Pcor estimate ρmrss 1 Implement MCP regularisation on models ( 8) and ( 9), and obtain X and Ŷ; 2 Implement ordinary least squares (OLS) on models (11) and (12), and obtain XReg2 and ŶReg2 with residuals η1 = X − XReg2 and τ1 = Y − ŶReg2 .Calculate the RSS by RSSX 1 = η1 2 2 and RSSY 1 = τ1 2 2 ; 3 Implement MCP regularisation on models ( 13) and ( 14), obtain estimated coefficients λ0 , γ0 and residuals η2 , τ2 The proposed MRSS algorithm selects the most suitable residuals by minimising RSS and removing the impact of control variables to optimise the estimation of residuals in the regression model.As such, the estimated Pcor generated by the MRSS algorithm combines the advantages of both algorithms, resulting in a more accurate estimate.Notably, our MRSS algorithm effectively addresses the Pcor estimation bias in cases where Pcor ≥ 0. For instance, when the Coef and Var algorithms estimate Pcor as 0 for true Pcor near 0, the MRSS algorithm utilises the minimum RSS principle to select the Reg2 algorithm, which performs better in the vicinity of Pcor = 0, and thereby efficiently avoids such misestimations.Around Pcor = 0.5, the MRSS algorithm employs the minimum RSS principle to determine the more accurate method between Reg2 and Var for exact selection.This selection conforms to the minimum RSS principle, where the regression model and accompanying residuals are selected to provide optimal estimation accuracy, leading to a more precise Pcor estimate.When Pcor lies close to 1, the Reg2 algorithm's estimates are typically lower with a high RSSs.Thereafter, the MRSS method selects the Var algorithm with small RSSs, which performs better based on the minimum RSS principle.In essence, the MRSS method amalgamates the merits of the Reg2 and Var algorithms.By reducing the sum of squares of the residuals, MRSS can choose the algorithm with a smaller estimation error for Pcor ≥ 0, which allows for the proficient regulation of the estimation bias of Pcor.

Data Generation
To study the estimation efficiency of Pcor estimation algorithms under high-dimensional conditions, we generate n centralised samples {x j , y j , z j1 , . . ., where u = [u 1 , . . ., u n ] T and e i = [e 1i , . . ., e ni ] T with u j and e ji generated independently from the normal distribution N(0, σ 2 ) with variance σ 2 for i = 1, . . ., p.The samples X and Y are then generated by 1+ω 2 and ε j , η j drawn i.i.d.from N(0, σ 2 ).The Pearson correlation of ε and ζ gives the partial correlation coefficient Pcor Notably, there is a one-to-one mapping between the true Pcor and the ω parameter.
Since our MRSS algorithm and the Reg2 algorithm perform essentially the same for Pcor < 0, our simulation focuses on real Pcor values in the range [0, 1], an interval prone to significant biases with existing methods.Let the true partial correlation coefficient vary as Pcor = 0, 0.05, 0.1, . . ., 0.95 with the sample size n = 50, 100, . . ., 400, the controlling variable size p = 200, 500, 1000, 2000, 4000 and the normal distribution variance σ 2 = 1, 10, 40.For each n, p combination, we estimate the partial correlation coefficient for 200 replications using the aforementioned estimation algorithms.We use the software R (4.3.1) for our simulation.
Recognising that both sparse and non-sparse conditions are prevalent in real-world applications [3,28], we present examples under both conditions.To ensure comparability between the examples, the initial l coefficients of α and β are fixed under both conditions, where we select the high-correlated numbers of controlling variables as l = 6, 10, 14.For non-sparse examples, the coefficients of α and β asymptotically converge to 0 at varying rates, with coefficients beyond the (l + 1)-th starting at 0.05, which is significantly smaller than the initial l coefficients.

•
Example 1: under sparse conditions Let the coefficients α and β be non-zero for the initial l elements and zero for the rest as follows α = −β = (−0. ), where r is a tuning parameter to make the (l + 1)-th element close to 0.05.

By MSE and RMSE
We will assess the efficacy of the Pcor estimation algorithms using the mean square error (MSE) and root mean square error (RMSE) indices as follows.These evaluation indicators may indicate the performance of Pcor estimation algorithms from various perspectives.
where ρ 0 is the true Pcor, and ρ(i) is the estimated Pcor in the (i)-th replication of R = 200 replications.Table 1 displays the mean of MSE and RMSE (×10 2 ) for the estimated Pcors of the true Pcor = 0, 0.05, . . ., 0.95 with l = 10, σ 2 = 1, n = 50, 100, 200, 400 and p = 200, 500, 1000, 2000, 4000 across Examples 1-4 using various methods.Tables A1  and A2, which consider the means of MSE and RMSE (×10 2 ) for the estimated Pcors for high correlation controlling variables number l = 6, 14, can be found in the Appendix.For small sample sizes (n < 100), all algorithms tend to underperform due to the limited data information, with the mean MSE and RMSE being approximately ten times higher than that of large sample size n > 100.And, our MRSS algorithm remains competitive, with both MSE and RMSE in the same order of magnitude as the best performance Lasso.Reg2.However, for large sample size (n ≥ 100), the MRSS algorithm's performance becomes notably superior.Specifically, the MRSS reduces the MSE by around 40% compared to the suboptimal MCP.Reg2, and this percentage grows with increasing n.The MRSS represents a significant improvement in algorithmic performance.Additionally, the MSE of the MRSS algorithm exhibits a slower increase with increasing controlling size p, implying improved stability to some extent.
To compare the performance of different algorithms more intuitively, we calculated the percentage difference of MSE by MSE MRSS −MSE ALG MSE ALG × 100% with ALG be algorithms listed above.Similarly, the percentage difference of RMSE can be calculated.And, Table 2 shows the average percentage difference of MSE and RMSE compared to the MRSS algorithm for a small sample size (n = 50) and large sample size(n = 100, 200, 400) with the same settings in Table 1.For a small sample size (n = 50), we observe a 10-20% decrease in MSE and RMSE for an MRSS algorithm relative to the Res algorithm, a 10-20% increase relative to Lasso.Reg2, and a slight change relative to other algorithms.For large sample size (n = 100, 200, 400), the MRSS algorithm reduces MSE by about 30-70% and RMSE by 20-60% relative to other algorithms, achieving effective control of the Pcor estimation error.These results further illustrate the superiority of the MRSS algorithm.For optimal Pcor estimation performance, we suggest using the MRSS algorithm with a minimum sample size of n = 100.For Examples 1-4, shifting from sparse to non-sparse conditions with increasing nonsparsity, we observe that all algorithms exhibit a higher MSE and RMSE under non-sparse conditions compared to sparse conditions, and the MSE and RMSE increase with increasing non-sparsity.This could be attributed to the greater impact and more complicated correlations of the controlling variables, resulting in a less accurate estimate of the partial correlation.However, even in Example 4 with the strongest non-sparsity, the MRSS algorithm still performs well, possessing the smallest MSE and RMSE and outperforming conventional algorithms.Especially under non-sparse conditions, the MRSS algorithm provides a dependable and accurate estimation of Pcor despite the influence of complex controlling variables.

For Pcor Values on [0, 1]
To investigate the effectiveness of Pcor estimation algorithms for various Pcor values, we set a constant ratio of the dimension of controlling variables to the sample size (i.e., a fixed p/n = 2, 10).Figure 1 displays the average estimated Pcor of 200 repetitions compared to the true Pcor for n = 100, 200, 400 and l = 6 in Example 1.The MRSS, MCP.Reg2, and MCP.Var are denoted in red, green and blue, respectively.When Pcor is small around Pcor < 0.5, the MRSS accurately simulates the true Pcor, performing similarly to the MCP.Reg2.When Pcor is large, like about Pcor > 0.5, the MRSS performs suboptimally and comparable to the MCP.Var, falling slightly behind the RSS2.Essentially, the MRSS effectively amalgamates the strengths of both MCP.Reg2 and MCP.Var algorithms, reducing potential weaknesses for Pcor estimation.For a small sample size n = 100, the MRSS leads to a significant improvement in the estimation for a large Pcor in [0, 1], but still a considerable estimation bias for small Pcor in [0, 1] owing to the limited sample size and information.For a large sample size n ≥ 200, the MRSS effectively reduces the Pcor estimation bias for Pcor> 0. Consequently, greatly enhancing the sample size substantially boosts the MRSS estimation accuracy, even if the ratio of the controlling variables dimension to the sample size p/n increases from 2 to 10.

Parameter Sensitivity
We investigate the sensitivity of the performance of the MRSS algorithm to different parameter settings, such as variance and sparsity.This allows us to explore the robustness of algorithms under different parameter configurations.

For Variance
We set a variance parameter σ 2 in data generation to test the stability of our algorithm under varying variance.Table 3 shows the mean of MSE (×10 2 ) and RMSE (×10 2 ) for the estimated Pcors of real Pcor = 0, 0.05, . . ., 0.95 with different variances σ 2 = 1, 10, 40 and l = 10 for a large sample size (n = 50, 100) and small sample size (n = 200, 400) in Examples 1-4.We discover that, as the variance increases σ 2 from 1 to 40, the MSE and RMSE remain consistent for various examples and sample sizes.This indicates that our MRSS algorithm is highly robust to variance and retains good stability.

For Sparsity
To evaluate the effectiveness of algorithms under different sparsity conditions, we set the data generation conditions to develop from sparse to non-sparse, with an increasingly non-sparse convergence rate from Example 1 to Example 4. This suggests a greater inclusion of controlling variables as we progress through the examples.From the above Tables 1-3, our observations show that the MRSS algorithm performs well for all examples.For moderate non-sparse convergence rates, as witnessed in Examples 2-3, MRSS demonstrates both low MSE and RMSE, comparable to the sparse conditions of Example 1.As the rate of non-sparsity convergence and the impact of controlling variables increase in Example 4, the best-performing MRSS also encounters difficulties in reducing the estimation bias.Therefore, the best-performing MRSS algorithm remains the most favoured choice for estimating Pcor under both sparse and non-sparse conditions.If it is possible to analyse the degree of non-sparsity the initial data, then we can obtain a better understanding of the algorithm's error margin.
Another indication of the sparsity strength is the number of high correlation controlling variables l. Figure 2 illustrates the performance of the featured algorithms for varying numbers l = 6, 10, 14.The figure contrasts the average Pcor with the true Pcor for l = 6, 10, 14 in Example 2 with the first row n = 100, p = 200 and the second n = 200, p = 2000.As l increases, the interference from controlling variables in the estimation process becomes more pronounced, leading to a heightened estimation bias.However, the MRSS algorithm consistently showcases an optimal performance throughout the entire [0, 1] interval.Remarkably, despite encountering a high interference level at l = 14, MRSS keeps the bias in close alignment with the diagonal, in contrast to its counterparts.Table 4 shows the mean of the MSE and RMSE for l = 6, 10, 14.As l increases, both the MSE and RMSE of the MRSS algorithm increase, but always remain slightly weaker than optimal in small samples and significantly more optimal than the other algorithms in large samples.These results demonstrate the robustness, stability, and precision advantages of the MRSS algorithm.

Summaries
Based on numerous simulations, our study examines the practicality and effectiveness of the MRSS algorithm in a variety of scenarios.Through extensive simulations, we provide valuable insights into the accuracy and effectiveness of the MRSS algorithm.We provide empirical evidence that MRSS effectively incorporates the strengths of the MCP.Reg2 and MCP.Var algorithms and reduces the potential weaknesses of Pcor estimation, especially in challenging environments with high-dimensional sparse and non-sparse conditions.For larger sample sizes (n ≥ 100), the MRSS algorithm reduces the MSE and RMSE by approximately 30-70% compared to other algorithms and effectively controls Pcor estimation errors.For small sample sizes (n < 100), a reduction of 10-20% is observed in MSE and RMSE for the MRSS algorithm compared to the Res algorithm, an increase of 10-20% compared to Lasso.Reg2, and a slight change compared to other algorithms.Conducting a sensitivity analysis with various variance and sparsity parameters, the outcomes demonstrate the benefits of the MRSS algorithm in terms of robustness, stability, and accuracy.As the variance increases from 1 to 40, the MSE and RMSE remain consistent for distinct examples and sample sizes.This demonstrates that our MRSS algorithm is remarkably resilient to variability and maintains excellent stability.As the level of sparsity decreases (from Examples 1-4, or from l = 6 to 14), it is noticeable that the MSE and RMSE of the MRSS algorithm increase, but remain within the same order of magnitude.Even the optimal MRSS algorithms undergo a significant rise in MSE and RMSE for Example 4 and l = 14, as an escalation of non-sparse and intricate controlling variables brings forth certain systematic errors.

Real Data Analysis
A distinguishing feature of financial markets is the observed correlation among the price movements of various financial assets.A prevalent feature entails the existence of a substantial cross-correlation between stock returns' simultaneous time evolution [29].In numerous instances, a strong correlation does not necessarily imply a significant direct relationship.For instance, two stocks in the same market may be subject to shared macroeconomic or investor psychology influences.Therefore, to examine the direct correlation between these stocks, it is necessary to eliminate the common drivers represented by the market index.The Pcor meets this requirement by assessing the direct relationship between the two stocks after removing the market impacts of controlling variables.When accurately estimating the Pcor, it is possible to evaluate the impact of diverse factors (e.g., economic sectors, other markets, or macroeconomic factors) on a specific stock.The resulting partial correlation data may be utilised in fields, such as stock market risk management, stock portfolio optimisation, and financial control [7,8].Moreover, the Pcor can also indicate the interdependence and influence of industries in the context of global integration.These techniques for analysing Pcor can provide valuable information on the correlations between different assets and different sectors of the economy, as they are generalisable and can be applied to other asset types and cross-asset relationships in financial markets.This information is beneficial for practitioners and policymakers.
We chose 100 stocks with substantial market capitalisation and robust liquidity from the Shanghai Stock Exchange (SSE) market.These stocks can comprehensively represent the overall performance of listed stock prices in China's A-share market.We then downloaded their daily adjusted closing prices from Yahoo Finance from January 2018 to August 2023 and removed the missing data.Here, a sufficient sample size of n = 1075 was chosen to ensure the effectiveness of algorithms and limit the bias in Pcor estimation.For each pair of the 100 stocks, we estimate their Pcor by setting the remaining stocks as the corresponding controlling variables and construct the estimated Pcor matrix.The Pcor matrix shows the better internal correlation between two stocks after removing the influence of the stock market.
Figure 3 presents the estimated Pcor matrices for 100 stocks from SSE markets using MCP.Reg2, MCP.Var and MRSS algorithms.Blue signifies Pcor = 1, while red represents Pcor = −1.Whilst the MCP.Coef, MCP.Var, and RSS2 algorithms all estimate Pcor as 0 when true Pcor approaches 0, our proposed MRSS algorithm resembles the MCP.Reg2, which estimates an accurate Pcor for weak partial correlation.Thus, the MRSS is capable of effectively estimating weak partial correlations.When dealing with high Pcor values and strong partial correlation, we find that the MCP.Var algorithm overestimates Pcor as a result of the divergence in stock prices.For two stocks with a higher stock price, the Pcor estimated by the Var algorithm to be overestimated or even most at 1. MRSS effectively solves this problem.Notably, as a result of incorporating the MCP.Var algorithm, the MRSS algorithm amplifies certain partial correlations that are not significant by MCP.Reg2.These results can also be seen in Table 5.The MRSS estimates these correlations to be stronger partial correlations resulting in improved clarity in the partial correlations.Figure 4 shows the stocks' Pcor network for the top-100 and top-50 pairs of Pcor estimates by the MRSS algorithm from 100 SSE stocks.The node represents the stock, coloured with its sector.The edge thickness represents the Pcor estimate between two nodes, with the thick-edge Pcor > 0.4 and the thin-edge Pcor < 0.4.Table 5 shows the stock pairs with their sector and Pcor estimates for all the MRSS estimated Pcor > 0.4 from 100 SSE stocks, and Table 6 shows the corresponding stock pairs with their company name, business, and sector.Here, we use industry classifications from the Global Industry Classification Standard (GICS) with Communication Services, Consumer Discretionary (C.D.), Consumer Staples, Energy, Financials, Health Care, Industrials, Information Technology (I.T.), Materials, Real Estate and Utilities.We find that two stocks connected in the partial correlation network with a high Pcor are almost in the same sector and operate in the same business.In addition, high Pcor values may indicate shareholding relationships between companies.For instance, the highly correlated 601398-601939-601288-601988-601328 (financials) are all state-controlled banks that do not have a direct high Pcor link with the city banks 601009-601166 (financials).And, stocks that do not belong to the same industry under a high Pcor may have certain other links behind them, such as 601519 (I.T.)-601700 (industrials) having a common major shareholder.After stripping out the other factors influencing the market, Pcor represents the inherent and intrinsic correlation between two stocks because they are in the same sector.
As societies become increasingly integrated, the productive activities of different industries become interdependent and interact with each other.Categorising a company into only one industry does not reflect its overall performance and associated risks.Many listed companies in the stock market belong to conglomerates and operate in different industry sectors, so it is natural for the performance of these companies to be affected by multiple industries.Therefore, we will also find that Pcor, apart from showing the correlation between industries, will also reveal the correlation between two industries that are linked together by two stocks in different industries.For example, the partial correlation between the Bank of Communications (601328) and PetroChina (601857) with Pcor = 0.258 links the Energy (600028-601857 in orange) and Financial (601398-601939-601288-601988-601328 in dark blue) sectors of state-owned assets.
Overall, the MRSS algorithm amalgamates the characteristics of MCP.Reg2 and MCP.Var, enhancing the estimation of strong partial correlations, while effectively estimating those weak partial correlations, ultimately revealing the stock correlations.estimation accuracy of the integrated algorithm.Reducing the computational complexity of our minimised RSS integration algorithm to decrease computing time represents a core issue in future research.Additionally, conducting in-depth theoretical research on MRSS algorithms, including a proof analysis of consistency and convergence, will be an essential direction for our next steps.Further refinement of theoretical proofs and an in-depth investigation of error convergence speed may uncover reasons for the systematic estimation bias that cannot be ignored when Pcor is positive in all current algorithms.Meanwhile, expanding the use of the MRSS algorithm to a wider range of fields is a focal point of our future research.Concerning financial data, we intend to thoroughly examine the biased correlations between financial data besides stocks and advise on relevant policies.

Figure 2 .
Figure 2. Average Pcor against true Pcor for n = 100, p = 200 in the first row and n = 200, p = 2000 in the second row with l = 6, 10, 14 in Example 2.

Figure 4 .
Figure 4. Stocks' Pcor network for the top-100 and top-50 pairs of Pcor estimates by MRSS algorithm from 100 SSE stocks.The node represents the stock, coloured with its sector.The edge thickness represents the Pcor estimate between two nodes, with the thick edge Pcor > 0.4 and the thin edge Pcor < 0.4.

Table 2 .
The average percentage difference of the MSE and RMSE compared to the MRSS algorithm for a small sample size (n = 50) and a large sample size (n = 100, 200, 400) with the same settings in

Table 5 .
Stock pairs with their sector and Pcor estimates for all the MRSS estimated Pcor > 0.4 by different algorithms from 100 SSE stocks.