Abstract
The partial correlation coefficient (Pcor) is a vital statistical tool employed across various scientific domains to decipher intricate relationships and reveal inherent mechanisms. However, existing methods for estimating Pcor often overlook its accurate calculation. In response, this paper introduces a minimum residual sum of squares Pcor estimation method (MRSS), a high-precision approach tailored for high-dimensional scenarios. Notably, the MRSS algorithm reduces the estimation bias encountered with positive Pcor. Through simulations on high-dimensional data, encompassing both sparse and non-sparse conditions, MRSS consistently mitigates the arithmetic bias for positive Pcors, surpassing other algorithms discussed. For instance, for large sample sizes () with Pcor > 0, the MRSS algorithm reduces the MSE and RMSE by about 30–70% compared to other algorithms. The robustness and stability of the MRSS algorithm is demonstrated by the sensitivity analysis with variance and sparsity parameters. Stocks data in China’s A-share market are employed to showcase the MRSS methodology’s practicality.
MSC:
62H20; 62J07
1. Introduction
The partial correlation coefficient (Pcor) measures the correlation between two random variables, X and Y, after accounting for the effects of controlling variables Z, denoted by . The Pcor essentially quantifies the unique relationship between X and Y, after removing the correlations between X and Z, and between Y and Z [1]. This correlation coefficient provides a more thorough comprehension of the connection between variables, untainted by the influence of confounding factors. Unlike the Pearson correlation coefficient, which only captures the direct correlation between random variables, the Pcor enables the identification of whether correlations stem from intermediary variables. This distinction enhances the precision and validity of statistical analyses.
The Pcor is a fundamental statistical tool for investigating intricate relationships and gaining a more profound comprehension of the underlying mechanisms in a variety of scientific fields, such as psychology, biology, economics, and social sciences. When examining genetic markers and illness outcomes, biologists used the Pcor to identify correlations while accounting for potential confounding factors [2,3,4]. Marrelec et al. utilised the partial correlation matrix to explore large-scale functional brain networks through functional MRI [5]. In the field of economics, Pcor assists in comprehending complex connections, including the interplay between interest rates and inflation, while considering other variables’ influence [6]. The financial industry also employs Pcor to interpret connections and relationships between stocks in the financial markets [7,8]. For example, Michis proposed a wavelet procedure for estimating Pcor between stock market returns over different time scales and implemented it for portfolio diversification [9]. Using partial correlations within a complex network framework, Singh et al. examined the degree of globalisation and regionalisation of stock market linkages and how these linkages vary across different economic or market cycles [10]. Meanwhile, the employment of the Gaussian graphical model (GGM) technique in psychology has recently gained popularity for defining the relationships between observed variables. This technique employs Pcors to represent pairwise interdependencies, controlling the influence of all other variables [11,12,13]. In the field of geography, a correlation analysis based on the Pcor of the fractal dimension of the variations of HZD components is implemented to study the geomagnetic field component variations in Russian [14].
Several methodologies have been proposed over the years to estimate the Pcor in statistical analyses. For instance, Peng et al. introduced a Pcor estimation technique that relies on the sparsity property of the partial correlation matrix and utilises sparse regression methods [3]. Khare et al. suggested a high-dimensional graphical model selection approach based on the use of pseudolikelihood [15]. Kim provided an R package “ppcor” for a fast calculation to semi-Pcor [16]. Huang et al. introduced the kernel partial correlation coefficient as a measure of the conditional dependence between two random variables in various topological spaces [17]. Van Aert and Goos focused on calculating the sampling variance of Pcor [18]. Hu and Qiu proposed a statistical inference procedure for Pcor under the high-dimensional nonparanormal model [19]. However, these methods mainly centre around determining whether or not the partial correlation coefficient is zero, without adequate regard for the precision of the Pcor calculation and the algorithm’s efficacy. We analysed multiple high-dimensional algorithms and discovered notable Pcor estimation biases, particularly for positive Pcor. Even with larger sample sizes, these biases persisted. Motivated by these findings, our primary goal is to put forward a Pcor estimation algorithm to increase the precision of the Pcor estimation algorithm and diminish the estimation bias for positive Pcor values.
This paper reviews current methods for estimating Pcor in high-dimensional data. We introduce a novel minimum residual sum of squares (MRSS) Pcor estimation method under high-dimensional conditions, aiming to mitigate the estimation bias for positive Pcor. The algorithm’s effectiveness is validated through simulation studies under sparse and non-sparse conditions and real data analysis on stock markets.
The sections are structured as follows: Section 2 outlines definitions and corresponding formulae for calculating Pcor, and examines common algorithms for estimating Pcor. Section 3 presents our Minimum Residual Sum of Squares Pcor estimation, designed to mitigate estimation bias for positive Pcor. In Section 4, we demonstrate the effectiveness of our proposed algorithm through simulation studies on high-dimensional data under both sparse and non-sparse conditions. Section 5 provides an analysis of real data related to stock markets, while Section 6 contains the conclusion.
2. Estimation for Partial Correlation Coefficient
2.1. Definition of Pcor
The classical definition of the partial correlation coefficient is defined by the correlation coefficient between the regression residuals from the linear models of two variables with the controlling variable, respectively. Let X and Y be two random variables, and be p-dimensional controlling variables. Consider the linear regression models of X and Y, respectively, with the controlling variable Z,
where and are error terms. The partial correlation coefficient between X and Y is conditional on Z, denoted by , and defined by the correlation coefficient between the regression residuals and , as follows
where is the correlation coefficient of two random variables; is the covariance of two random variables; is the variance of a random variable. Let the sample size be n. In conventional low-dimensional cases (), the ordinary least squares (OLS) is used to compute the residuals and . Subsequently, the Pcor is computed from the correlation coefficient of residuals. However, the OLS method is not practical for high-dimensional cases (). Regularisation methods are introduced to deal with such cases later.
2.2. Calculation Formulae of Pcor
2.2.1. Based on Concentration Matrix
The concentration matrix can also be used to calculate Pcor. Let , , and be the covariance matrix. When assuming that is a non-singular matrix, the concentration matrix is denoted as . Consider the following linear regression,
where is the regression coefficient and is the regression error. We have
where is the estimator of b and is the estimator of . The regression residual is independent of . The covariance matrix of can be computed by
According to the definition in Equation (1) and the partial correlation coefficient can be computed by
2.2.2. Based on Additional Regression Models
Additional linear regression models are introduced to calculate the Pcor. Consider new linear regression models of X with and Y with , respectively,
where and are regression error terms. Peng et al. [3] established the correlation between the aforementioned regression coefficients and Pcor, while verifying that formulas , and hold. Then, we derive . Thus, the partial correlation coefficient between X and Y can be calculated by the formula below,
where is the sign function.
Consider linear regression models of Y with and Y with , respectively,
where and are error terms. The partial correlation coefficient can also be calculated as follows [20]
2.3. Regularisation Regression for High-Dimensional Cases
Suppose we have centralised samples i.i.d. observed from with . Let , and . We consider matrix-type linear regression models as follows,
where and are error terms. If we estimate regression coefficients and , then we can calculate the estimated residuals and , with and . According to the definition of Pcor, we can estimate the Pcor as follows
where , and with , .
In high-dimensional () situations, the penalty function and regularisation regression methods can be introduced to estimate the regression coefficients for regression models. Regularisation regression methods address overfitting in statistical modelling by adding a penalty to the loss function, constraining the coefficient magnitudes. Let be the penalty function with a tuning parameter , for example, the regularisation estimate of model (8) is given by
where the penalty could widely choose the Lasso penalty [21], the Ridge penalty [22], the SCAD penalty [23], the Elastic net [24], the Fused lasso [25], the MCP penalty [26], and other penalty functions. In this paper, the Lasso regularisation with penalty as is implemented by the R-package “glmnet” [27], and the MCP with penalty as , is implemented by the R-package “ncvreg”.
2.4. Existing Pcor Estimation Algorithms
To investigate high-dimensional Pcor estimation methods, we present some existing methods that are suitable for both sparse and non-sparse conditions. Combining the advantages and disadvantages of these methods, we propose a new high-dimensional Pcor estimation method: MRSS—minimum residual sum of squares partial correlation coefficient estimation algorithm.
2.4.1. Res Algorithm
The Res algorithm is primarily defined by the Pcor definition. This algorithm is implemented as follows. First, we use the regularisation regression (Lasso and MCP) on linear models (8) and (9) to obtain the estimated regression coefficients and ; then calculate estimated residuals and , with and ; at last, estimate Pcor by formula (10).
2.4.2. Reg2 Algorithm
The Reg2 algorithm can more effectively remove the influence of Z in X and Y using the new regressions below. Consider new linear regression models as follows
where and are error terms, the estimators and are estimated by the Lasso or MCP regularisation regressions of models (8) and (9). Then, we implement the ordinary least squares (OLS) on models (11) and (12), and denote new estimators of and by and . Computing new residuals and , we finally estimate Pcor by the Reg2 algorithm as .
2.4.3. Coef and Var Algorithm
The Coef and Var algorithm is generated through the introduction of novel regression coefficients based on the Pcor definition formula (5) and (6). Consider linear regression models as follows
where and are error terms. Then, we implement MCP regularisation on these models (13) and (14) and obtain estimated first-term regression coefficients , and the estimated variance , . Finally, we can obtain the Pcor estimate by Coef algorithm as and the Pcor estimate by Var algorithm as .
2.4.4. RSS2 Algorithm
The RSS2 algorithm is given by the residual sum of squares in formula (7). First, we implement the MCP regularisation on model (9): and estimate the residual and the residual sum of squares (RSS) . Similarly, we implement the MCP regularisation on model (14): and estimate the first-term regression coefficient , the residual , and the RSS . Then, we obtain the Pcor estimate . Switch the position of and similarly as the above steps. Then, we implement the MCP regularisation on model (8): and model (13): and obtain the RSS , and the estimated first-term coefficient . We obtain another Pcor estimate . Finally, we have the estimate Pcor by RSS2 algorithm as .
3. Minimum Residual Sum of Squares Pcor Estimation Algorithm
3.1. Motivation
From the comprehensive simulations in this paper, it is evident that the Pcor estimation methods discussed exhibit significant bias. This bias becomes more pronounced as the true Pcor increases, especially when the Pcor is positive. Therefore, further research is necessary to address this estimation bias in positive Pcor scenarios. While each algorithm has its merits, the Reg2 algorithm performs notably well when Pcor is below approximately . In contrast, the Coef and Var algorithm stands out with minimal bias when Pcor exceeds roughly . Our goal is to develop a method that synergises the strengths of both the Reg2 and Var algorithms.
When compared to models (13) and (14) from the Coef and Var algorithm, it is evident that the residuals and share commonalities. Both provide insights into the information in X after the exclusion of Y and Z effects in some sense. Similarly, and capture the essence of Y after removing for X and Z influences. If we choose a and with a smaller residual sum of squares, then this will lead to a better estimation for the corresponding regression models. A reduced residual sum of squares in the corresponding regression models signifies enhanced precision in eliminating controlling variables effects, leading to a more accurate Pcor estimator. Guided by the objective of minimising the residual sum of squares, we introduce a novel algorithm for high-dimensional Pcor estimation in the subsequent subsection.
3.2. MRSS Algorithm and Its Implementation
We propose a novel Minimum Residual Sum of Squares partial correlation coefficient estimation algorithm, denoted by MRSS. This algorithm aims to diminish the estimation bias for positive Pcor values under high-dimensional situations. Our MRSS algorithm amalgamates the strengths of the Reg2, Coef, and Var algorithms, effectively curtailing bias in Pcor estimation.
Define and as the residual sum of squares of X after removing the effects of X and Z, and the residual sum of squares of Y after removing the effects of X and Z, respectively. The tuning parameter k is chosen by minimising the sum of squares of the residuals, so as to remove more associated effects and ensure a more efficient Pcor estimator. For , the pair represents the residuals from the Reg2 algorithm’ models (11) and (12). For , corresponds to the residuals from the Coef and Var algorithms’ models (13) and (14). Then, the residuals estimated by the MRSS algorithm satisfy the minimum residual sum of squares of both X and Y for a more efficient Pcor estimator as follows
The Pcor estimated by MRSS is then given by
where I is the indicator function and is the primary regression coefficient in model (13). If , then is estimated following the idea of Reg2 algorithm; if , then is estimated following the idea of the Coef and Var algorithm. If the two k estimates in (17) differ, the more stable Reg2 algorithm is preferred, setting in (18). Given that MRSS integrates two existing algorithms, its convergence should align with their rates.
During the implementation of the MRSS algorithm (Algorithm 1), the Coef and Var algorithm often misestimates Pcor as 0 or when the true Pcor is close to 0 or , affecting the algorithms’ precision. To address this, we incorporate a discriminative condition in the MRSS pseudo-code. If the estimated Pcor or is zero or , the Coef and Var algorithm is deemed unreliable, and the Reg2 algorithm’s estimate is adopted.


The proposed MRSS algorithm selects the most suitable residuals by minimising RSS and removing the impact of control variables to optimise the estimation of residuals in the regression model. As such, the estimated Pcor generated by the MRSS algorithm combines the advantages of both algorithms, resulting in a more accurate estimate. Notably, our MRSS algorithm effectively addresses the Pcor estimation bias in cases where . For instance, when the Coef and Var algorithms estimate Pcor as 0 for true Pcor near 0, the MRSS algorithm utilises the minimum RSS principle to select the Reg2 algorithm, which performs better in the vicinity of , and thereby efficiently avoids such misestimations. Around Pcor , the MRSS algorithm employs the minimum RSS principle to determine the more accurate method between Reg2 and Var for exact selection. This selection conforms to the minimum RSS principle, where the regression model and accompanying residuals are selected to provide optimal estimation accuracy, leading to a more precise Pcor estimate. When Pcor lies close to 1, the Reg2 algorithm’s estimates are typically lower with a high RSSs. Thereafter, the MRSS method selects the Var algorithm with small RSSs, which performs better based on the minimum RSS principle. In essence, the MRSS method amalgamates the merits of the Reg2 and Var algorithms. By reducing the sum of squares of the residuals, MRSS can choose the algorithm with a smaller estimation error for , which allows for the proficient regulation of the estimation bias of Pcor.
4. Simulation
4.1. Data Generation
To study the estimation efficiency of Pcor estimation algorithms under high-dimensional conditions, we generate n centralised samples i.i.d from with . Let , and . Initially, we produce n controlling samples independently and identically by
where and with and generated independently from the normal distribution with variance for . The samples and are then generated by
where and with and , drawn i.i.d. from . The Pearson correlation of and gives the partial correlation coefficient Pcor . Notably, there is a one-to-one mapping between the true Pcor and the parameter.
Since our MRSS algorithm and the Reg2 algorithm perform essentially the same for , our simulation focuses on real Pcor values in the range , an interval prone to significant biases with existing methods. Let the true partial correlation coefficient vary as with the sample size , the controlling variable size , and the normal distribution variance . For each combination, we estimate the partial correlation coefficient for 200 replications using the aforementioned estimation algorithms. We use the software R (4.3.1) for our simulation.
Recognising that both sparse and non-sparse conditions are prevalent in real-world applications [3,28], we present examples under both conditions. To ensure comparability between the examples, the initial l coefficients of and are fixed under both conditions, where we select the high-correlated numbers of controlling variables as . For non-sparse examples, the coefficients of and asymptotically converge to 0 at varying rates, with coefficients beyond the -th starting at , which is significantly smaller than the initial l coefficients.
- Example 1: under sparse conditionsLet the coefficients and be non-zero for the initial l elements and zero for the rest as follows
- Example 2: under non-sparse conditionsLet the coefficients and be the same as Example 1 for the initial l elements with a convergence rate of for the remaining elements as followswhere r is a tuning parameter to make the -th element close to .
- Example 3: under non-sparse conditionsLet the coefficients and be the same as Example 1 for the initial l elements with a convergence rate of for the remaining elements as follows,where r is a tuning parameter to make the -th element close to .
- Example 4: under non-sparse conditionsLet the coefficients and be the same as Example 1 for the initial l elements with a convergence rate of for the remaining elements as follows,where r is a tuning parameter to make the -th element close to .
4.2. Simulation Results
4.2.1. By MSE and RMSE
We will assess the efficacy of the Pcor estimation algorithms using the mean square error (MSE) and root mean square error (RMSE) indices as follows. These evaluation indicators may indicate the performance of Pcor estimation algorithms from various perspectives.
where is the true Pcor, and is the estimated Pcor in the -th replication of replications.
Table 1 displays the mean of MSE and RMSE () for the estimated Pcors of the true , with , , and across Examples 1–4 using various methods. Table A1 and Table A2, which consider the means of MSE and RMSE () for the estimated Pcors for high correlation controlling variables number , can be found in the Appendix.
Table 1.
The mean of MSE () and RMSE () for estimated Pcors of real with , , and in Examples 1–4.
For small sample sizes (), all algorithms tend to underperform due to the limited data information, with the mean MSE and RMSE being approximately ten times higher than that of large sample size . And, our MRSS algorithm remains competitive, with both MSE and RMSE in the same order of magnitude as the best performance Lasso.Reg2. However, for large sample size (), the MRSS algorithm’s performance becomes notably superior. Specifically, the MRSS reduces the MSE by around compared to the suboptimal MCP.Reg2, and this percentage grows with increasing n. The MRSS represents a significant improvement in algorithmic performance. Additionally, the MSE of the MRSS algorithm exhibits a slower increase with increasing controlling size p, implying improved stability to some extent.
To compare the performance of different algorithms more intuitively, we calculated the percentage difference of MSE by with be algorithms listed above. Similarly, the percentage difference of RMSE can be calculated. And, Table 2 shows the average percentage difference of MSE and RMSE compared to the MRSS algorithm for a small sample size () and large sample size() with the same settings in Table 1. For a small sample size (), we observe a 10–20% decrease in MSE and RMSE for an MRSS algorithm relative to the Res algorithm, a 10–20% increase relative to Lasso.Reg2, and a slight change relative to other algorithms. For large sample size (), the MRSS algorithm reduces MSE by about 30–70% and RMSE by 20–60% relative to other algorithms, achieving effective control of the Pcor estimation error. These results further illustrate the superiority of the MRSS algorithm. For optimal Pcor estimation performance, we suggest using the MRSS algorithm with a minimum sample size of .
Table 2.
The average percentage difference of the MSE and RMSE compared to the MRSS algorithm for a small sample size () and a large sample size () with the same settings in Table 1.
For Examples 1–4, shifting from sparse to non-sparse conditions with increasing non-sparsity, we observe that all algorithms exhibit a higher MSE and RMSE under non-sparse conditions compared to sparse conditions, and the MSE and RMSE increase with increasing non-sparsity. This could be attributed to the greater impact and more complicated correlations of the controlling variables, resulting in a less accurate estimate of the partial correlation. However, even in Example 4 with the strongest non-sparsity, the MRSS algorithm still performs well, possessing the smallest MSE and RMSE and outperforming conventional algorithms. Especially under non-sparse conditions, the MRSS algorithm provides a dependable and accurate estimation of Pcor despite the influence of complex controlling variables.
4.2.2. For Pcor Values on
To investigate the effectiveness of Pcor estimation algorithms for various Pcor values, we set a constant ratio of the dimension of controlling variables to the sample size (i.e., a fixed ). Figure 1 displays the average estimated Pcor of 200 repetitions compared to the true Pcor for and in Example 1. The MRSS, MCP.Reg2, and MCP.Var are denoted in red, green and blue, respectively. When Pcor is small around Pcor < 0.5, the MRSS accurately simulates the true Pcor, performing similarly to the MCP.Reg2. When Pcor is large, like about Pcor > 0.5, the MRSS performs sub-optimally and comparable to the MCP.Var, falling slightly behind the RSS2. Essentially, the MRSS effectively amalgamates the strengths of both MCP.Reg2 and MCP.Var algorithms, reducing potential weaknesses for Pcor estimation. For a small sample size , the MRSS leads to a significant improvement in the estimation for a large Pcor in , but still a considerable estimation bias for small Pcor in owing to the limited sample size and information. For a large sample size , the MRSS effectively reduces the Pcor estimation bias for Pcor . Consequently, greatly enhancing the sample size substantially boosts the MRSS estimation accuracy, even if the ratio of the controlling variables dimension to the sample size increases from 2 to 10.
Figure 1.
Average Pcor against true Pcor of each true Pcor for in first row and in second row with and in Example 1.
4.3. Parameter Sensitivity
We investigate the sensitivity of the performance of the MRSS algorithm to different parameter settings, such as variance and sparsity. This allows us to explore the robustness of algorithms under different parameter configurations.
4.3.1. For Variance
We set a variance parameter in data generation to test the stability of our algorithm under varying variance. Table 3 shows the mean of MSE () and RMSE () for the estimated Pcors of real with different variances and for a large sample size () and small sample size () in Examples 1–4. We discover that, as the variance increases from 1 to 40, the MSE and RMSE remain consistent for various examples and sample sizes. This indicates that our MRSS algorithm is highly robust to variance and retains good stability.
Table 3.
The means of MSE () and RMSE () for the estimated Pcors of real with different variances and for large sample size () and small sample size () in Examples 1–4.
4.3.2. For Sparsity
To evaluate the effectiveness of algorithms under different sparsity conditions, we set the data generation conditions to develop from sparse to non-sparse, with an increasingly non-sparse convergence rate from Example 1 to Example 4. This suggests a greater inclusion of controlling variables as we progress through the examples. From the above Table 1, Table 2 and Table 3, our observations show that the MRSS algorithm performs well for all examples. For moderate non-sparse convergence rates, as witnessed in Examples 2–3, MRSS demonstrates both low MSE and RMSE, comparable to the sparse conditions of Example 1. As the rate of non-sparsity convergence and the impact of controlling variables increase in Example 4, the best-performing MRSS also encounters difficulties in reducing the estimation bias. Therefore, the best-performing MRSS algorithm remains the most favoured choice for estimating Pcor under both sparse and non-sparse conditions. If it is possible to analyse the degree of non-sparsity the initial data, then we can obtain a better understanding of the algorithm’s error margin.
Another indication of the sparsity strength is the number of high correlation controlling variables l. Figure 2 illustrates the performance of the featured algorithms for varying numbers . The figure contrasts the average Pcor with the true Pcor for in Example 2 with the first row and the second . As l increases, the interference from controlling variables in the estimation process becomes more pronounced, leading to a heightened estimation bias. However, the MRSS algorithm consistently showcases an optimal performance throughout the entire interval. Remarkably, despite encountering a high interference level at , MRSS keeps the bias in close alignment with the diagonal, in contrast to its counterparts. Table 4 shows the mean of the MSE and RMSE for . As l increases, both the MSE and RMSE of the MRSS algorithm increase, but always remain slightly weaker than optimal in small samples and significantly more optimal than the other algorithms in large samples. These results demonstrate the robustness, stability, and precision advantages of the MRSS algorithm.
Figure 2.
Average Pcor against true Pcor for in the first row and in the second row with in Example 2.
Table 4.
The mean of MSE () and RMSE () for estimated Pcors of real with and for a large sample size () and small sample size () in Examples 1–4.
4.4. Summaries
Based on numerous simulations, our study examines the practicality and effectiveness of the MRSS algorithm in a variety of scenarios. Through extensive simulations, we provide valuable insights into the accuracy and effectiveness of the MRSS algorithm. We provide empirical evidence that MRSS effectively incorporates the strengths of the MCP.Reg2 and MCP.Var algorithms and reduces the potential weaknesses of Pcor estimation, especially in challenging environments with high-dimensional sparse and non-sparse conditions. For larger sample sizes (), the MRSS algorithm reduces the MSE and RMSE by approximately 30–70% compared to other algorithms and effectively controls Pcor estimation errors. For small sample sizes (), a reduction of 10–20% is observed in MSE and RMSE for the MRSS algorithm compared to the Res algorithm, an increase of 10–20% compared to Lasso.Reg2, and a slight change compared to other algorithms.
Conducting a sensitivity analysis with various variance and sparsity parameters, the outcomes demonstrate the benefits of the MRSS algorithm in terms of robustness, stability, and accuracy. As the variance increases from 1 to 40, the MSE and RMSE remain consistent for distinct examples and sample sizes. This demonstrates that our MRSS algorithm is remarkably resilient to variability and maintains excellent stability. As the level of sparsity decreases (from Examples 1–4, or from to 14), it is noticeable that the MSE and RMSE of the MRSS algorithm increase, but remain within the same order of magnitude. Even the optimal MRSS algorithms undergo a significant rise in MSE and RMSE for Example 4 and , as an escalation of non-sparse and intricate controlling variables brings forth certain systematic errors.
5. Real Data Analysis
A distinguishing feature of financial markets is the observed correlation among the price movements of various financial assets. A prevalent feature entails the existence of a substantial cross-correlation between stock returns’ simultaneous time evolution [29]. In numerous instances, a strong correlation does not necessarily imply a significant direct relationship. For instance, two stocks in the same market may be subject to shared macroeconomic or investor psychology influences. Therefore, to examine the direct correlation between these stocks, it is necessary to eliminate the common drivers represented by the market index. The Pcor meets this requirement by assessing the direct relationship between the two stocks after removing the market impacts of controlling variables. When accurately estimating the Pcor, it is possible to evaluate the impact of diverse factors (e.g., economic sectors, other markets, or macroeconomic factors) on a specific stock. The resulting partial correlation data may be utilised in fields, such as stock market risk management, stock portfolio optimisation, and financial control [7,8]. Moreover, the Pcor can also indicate the interdependence and influence of industries in the context of global integration. These techniques for analysing Pcor can provide valuable information on the correlations between different assets and different sectors of the economy, as they are generalisable and can be applied to other asset types and cross-asset relationships in financial markets. This information is beneficial for practitioners and policymakers.
We chose 100 stocks with substantial market capitalisation and robust liquidity from the Shanghai Stock Exchange (SSE) market. These stocks can comprehensively represent the overall performance of listed stock prices in China’s A-share market. We then downloaded their daily adjusted closing prices from Yahoo Finance from January 2018 to August 2023 and removed the missing data. Here, a sufficient sample size of was chosen to ensure the effectiveness of algorithms and limit the bias in Pcor estimation. For each pair of the 100 stocks, we estimate their Pcor by setting the remaining stocks as the corresponding controlling variables and construct the estimated Pcor matrix. The Pcor matrix shows the better internal correlation between two stocks after removing the influence of the stock market.
Figure 3 presents the estimated Pcor matrices for 100 stocks from SSE markets using MCP.Reg2, MCP.Var and MRSS algorithms. Blue signifies , while red represents . Whilst the MCP.Coef, MCP.Var, and RSS2 algorithms all estimate Pcor as 0 when true Pcor approaches 0, our proposed MRSS algorithm resembles the MCP.Reg2, which estimates an accurate Pcor for weak partial correlation. Thus, the MRSS is capable of effectively estimating weak partial correlations. When dealing with high Pcor values and strong partial correlation, we find that the MCP.Var algorithm overestimates Pcor as a result of the divergence in stock prices. For two stocks with a higher stock price, the Pcor estimated by the Var algorithm to be overestimated or even most at 1. MRSS effectively solves this problem. Notably, as a result of incorporating the MCP.Var algorithm, the MRSS algorithm amplifies certain partial correlations that are not significant by MCP.Reg2. These results can also be seen in Table 5. The MRSS estimates these correlations to be stronger partial correlations resulting in improved clarity in the partial correlations.
Figure 3.
Estimated Pcor matrix of 100 HKSE stocks, with blue representing and red representing .
Table 5.
Stock pairs with their sector and Pcor estimates for all the MRSS estimated Pcor by different algorithms from 100 SSE stocks.
Figure 4 shows the stocks’ Pcor network for the top-100 and top-50 pairs of Pcor estimates by the MRSS algorithm from 100 SSE stocks. The node represents the stock, coloured with its sector. The edge thickness represents the Pcor estimate between two nodes, with the thick-edge Pcor and the thin-edge Pcor . Table 5 shows the stock pairs with their sector and Pcor estimates for all the MRSS estimated Pcor from 100 SSE stocks, and Table 6 shows the corresponding stock pairs with their company name, business, and sector. Here, we use industry classifications from the Global Industry Classification Standard (GICS) with Communication Services, Consumer Discretionary (C.D.), Consumer Staples, Energy, Financials, Health Care, Industrials, Information Technology (I.T.), Materials, Real Estate and Utilities. We find that two stocks connected in the partial correlation network with a high Pcor are almost in the same sector and operate in the same business. In addition, high Pcor values may indicate shareholding relationships between companies. For instance, the highly correlated 601398–601939–601288–601988–601328 (financials) are all state-controlled banks that do not have a direct high Pcor link with the city banks 601009–601166 (financials). And, stocks that do not belong to the same industry under a high Pcor may have certain other links behind them, such as 601519 (I.T.)–601700 (industrials) having a common major shareholder. After stripping out the other factors influencing the market, Pcor represents the inherent and intrinsic correlation between two stocks because they are in the same sector.
Figure 4.
Stocks’ Pcor network for the top-100 and top-50 pairs of Pcor estimates by MRSS algorithm from 100 SSE stocks. The node represents the stock, coloured with its sector. The edge thickness represents the Pcor estimate between two nodes, with the thick edge Pcor and the thin edge Pcor .
Table 6.
Stock pairs with their company name, business, and sector for all the MRSS estimated Pcor from 100 SSE stocks.
As societies become increasingly integrated, the productive activities of different industries become interdependent and interact with each other. Categorising a company into only one industry does not reflect its overall performance and associated risks. Many listed companies in the stock market belong to conglomerates and operate in different industry sectors, so it is natural for the performance of these companies to be affected by multiple industries. Therefore, we will also find that Pcor, apart from showing the correlation between industries, will also reveal the correlation between two industries that are linked together by two stocks in different industries. For example, the partial correlation between the Bank of Communications (601328) and PetroChina (601857) with Pcor links the Energy (600028–601857 in orange) and Financial (601398–601939–601288–601988–601328 in dark blue) sectors of state-owned assets.
Overall, the MRSS algorithm amalgamates the characteristics of MCP.Reg2 and MCP.Var, enhancing the estimation of strong partial correlations, while effectively estimating those weak partial correlations, ultimately revealing the stock correlations.
6. Conclusions
This paper presents a novel minimum residual sum of squares (MRSS) algorithm for estimating partial correlation coefficients. Its purpose is to reduce the estimation bias of positive partial correlation coefficients in high-dimensional settings under both sparse and non-sparse conditions. The MRSS algorithm is effective in mitigating a Pcor estimation bias by synergistically harnessing the strengths of the Coef, Reg2, and Var algorithms. We also discuss the MRSS algorithm mathematical foundation and its performance in various scenarios compared to some existing algorithms. Through rigorous simulations and real data analysis, it becomes evident that the MRSS algorithm consistently outperforms its constituent and listed algorithms, particularly in challenging environments characterised by non-sparse conditions and high dimensionality. The sensitivity analysis with variance and sparsity parameters demonstrate the robustness, stability, and precision advantages of the MRSS algorithm. Further evidence of the effectiveness of the MRSS algorithm in the correlation analysis of stock data is provided by real data analyses.
7. Future Work
Our proposed MRSS algorithm combines the benefits of two existing algorithms by reducing the total squared residuals and enhancing the accuracy of Pcor estimation. In upcoming studies, we may explore the integration of additional algorithms by minimising the RSS to achieve a greater amalgamation of benefits from various algorithms and improve the estimation accuracy of the integrated algorithm. Reducing the computational complexity of our minimised RSS integration algorithm to decrease computing time represents a core issue in future research. Additionally, conducting in-depth theoretical research on MRSS algorithms, including a proof analysis of consistency and convergence, will be an essential direction for our next steps. Further refinement of theoretical proofs and an in-depth investigation of error convergence speed may uncover reasons for the systematic estimation bias that cannot be ignored when Pcor is positive in all current algorithms. Meanwhile, expanding the use of the MRSS algorithm to a wider range of fields is a focal point of our future research. Concerning financial data, we intend to thoroughly examine the biased correlations between financial data besides stocks and advise on relevant policies.
Author Contributions
Conceptualisation and methodology, J.Y. and M.Y.; software, G.B. and J.Y.; validation and formal analysis, G.B.; data curation, writing—original draft preparation, review and editing, and visualisation, J.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Doctoral Foundation of Yunnan Normal University (Project No.2020ZB014) and the Youth Project of Yunnan Basic Research Program (Project No.202201AU070051).
Data Availability Statement
The authors confirm that the data supporting the findings of this study are available within the article.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Tables for the mean of MSE () and RMSE () of estimated Pcors of real with , , and the numbers of high correlation controlling variables in Examples 1–4.
Table A1.
The mean of MSE () and RMSE () for the estimated Pcors of real with , , and in Examples 1–4.
Table A1.
The mean of MSE () and RMSE () for the estimated Pcors of real with , , and in Examples 1–4.
| n | p | MSE () | RMSE () | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | Lasso | MCP | Lasso | MCP | |||||||||||||
| Example 1 | Res | Reg2 | Res | Reg2 | Coef | Var | RSS2 | MRSS | Res | Reg2 | Res | Reg2 | Coef | Var | RSS2 | MRSS | |
| 50 | 200 | 10.0 | 11.1 | 10.6 | 10.2 | 15.8 | 14.3 | 12.1 | 8.1 | 31.2 | 31.9 | 32.0 | 31.0 | 35.3 | 33.8 | 31.5 | 27.6 |
| 500 | 11.4 | 14.3 | 11.7 | 12.3 | 20.5 | 19.1 | 16.2 | 11.3 | 33.2 | 36.0 | 33.5 | 33.9 | 40.2 | 39.1 | 36.4 | 32.6 | |
| 1000 | 11.8 | 15.8 | 12.0 | 13.1 | 22.5 | 21.5 | 18.6 | 12.8 | 33.8 | 37.6 | 34.1 | 35.0 | 42.0 | 41.2 | 39.0 | 34.7 | |
| 100 | 500 | 7.4 | 6.6 | 7.2 | 5.8 | 7.3 | 6.2 | 5.3 | 3.1 | 26.7 | 24.5 | 26.4 | 23.4 | 22.3 | 20.9 | 19.6 | 16.3 |
| 1000 | 8.6 | 8.6 | 8.5 | 7.2 | 9.2 | 8.0 | 6.8 | 4.0 | 28.9 | 27.7 | 28.6 | 26.0 | 25.2 | 23.8 | 22.2 | 18.4 | |
| 2000 | 9.6 | 10.7 | 9.4 | 8.7 | 11.2 | 10.0 | 8.5 | 5.4 | 30.4 | 30.9 | 30.1 | 28.4 | 28.1 | 26.8 | 25.0 | 21.4 | |
| 200 | 500 | 3.1 | 1.7 | 2.0 | 1.1 | 1.8 | 1.5 | 1.6 | 0.6 | 17.3 | 12.6 | 14.1 | 10.6 | 10.7 | 10.0 | 10.7 | 7.2 |
| 1000 | 4.1 | 2.6 | 2.9 | 1.6 | 2.4 | 2.0 | 2.1 | 0.8 | 19.8 | 15.6 | 16.6 | 12.5 | 12.3 | 11.6 | 12.2 | 8.1 | |
| 2000 | 5.0 | 3.7 | 3.7 | 2.2 | 3.1 | 2.6 | 2.5 | 1.0 | 21.9 | 18.3 | 19.0 | 14.4 | 13.9 | 13.0 | 13.3 | 9.0 | |
| 400 | 1000 | 1.3 | 0.7 | 0.5 | 0.4 | 0.6 | 0.5 | 0.6 | 0.3 | 11.3 | 8.3 | 7.0 | 6.3 | 6.0 | 5.8 | 6.5 | 4.6 |
| 2000 | 1.7 | 1.1 | 0.7 | 0.5 | 0.8 | 0.7 | 0.8 | 0.3 | 12.9 | 10.1 | 8.1 | 7.2 | 6.7 | 6.4 | 7.2 | 5.0 | |
| 4000 | 2.2 | 1.5 | 0.9 | 0.7 | 0.9 | 0.8 | 0.9 | 0.4 | 14.5 | 12.1 | 9.3 | 8.1 | 7.4 | 7.0 | 8.1 | 5.4 | |
| Example 2 | |||||||||||||||||
| 50 | 200 | 10.5 | 11.5 | 11.0 | 10.6 | 16.5 | 15.0 | 12.4 | 8.6 | 31.8 | 32.5 | 32.5 | 31.6 | 36.4 | 35.0 | 31.9 | 28.7 |
| 500 | 11.8 | 14.8 | 12.1 | 12.7 | 21.6 | 19.9 | 16.7 | 11.8 | 33.7 | 36.5 | 34.1 | 34.4 | 41.3 | 39.9 | 37.1 | 33.3 | |
| 1000 | 12.2 | 16.0 | 12.4 | 13.5 | 23.1 | 22.0 | 19.2 | 13.1 | 34.3 | 37.9 | 34.6 | 35.5 | 42.5 | 41.7 | 39.7 | 35.1 | |
| 100 | 500 | 7.8 | 6.9 | 7.6 | 6.1 | 7.8 | 6.7 | 5.5 | 3.3 | 27.4 | 25.0 | 27.1 | 23.9 | 23.4 | 21.9 | 20.1 | 16.8 |
| 1000 | 9.0 | 9.0 | 8.9 | 7.6 | 9.7 | 8.5 | 7.1 | 4.4 | 29.5 | 28.4 | 29.2 | 26.6 | 26.3 | 24.8 | 22.8 | 19.5 | |
| 2000 | 10.0 | 11.0 | 9.8 | 9.0 | 11.6 | 10.5 | 8.8 | 5.7 | 31.0 | 31.4 | 30.7 | 29.0 | 28.9 | 27.6 | 25.5 | 22.3 | |
| 200 | 500 | 3.3 | 1.8 | 2.3 | 1.3 | 2.0 | 1.7 | 1.7 | 0.7 | 18.0 | 13.2 | 14.9 | 11.4 | 11.3 | 10.6 | 11.1 | 7.7 |
| 1000 | 4.4 | 2.9 | 3.2 | 1.8 | 2.7 | 2.3 | 2.1 | 0.9 | 20.5 | 16.2 | 17.5 | 13.3 | 13.3 | 12.4 | 12.4 | 8.6 | |
| 2000 | 5.3 | 3.9 | 4.1 | 2.4 | 3.4 | 2.9 | 2.6 | 1.1 | 22.7 | 18.9 | 19.8 | 15.1 | 14.8 | 13.8 | 13.6 | 9.6 | |
| 400 | 1000 | 1.5 | 0.8 | 0.6 | 0.5 | 0.7 | 0.6 | 0.6 | 0.3 | 12.1 | 9.1 | 7.7 | 7.0 | 6.5 | 6.3 | 6.7 | 5.1 |
| 2000 | 1.9 | 1.2 | 0.8 | 0.6 | 0.8 | 0.7 | 0.8 | 0.4 | 13.7 | 10.9 | 8.9 | 8.0 | 7.2 | 6.9 | 7.4 | 5.5 | |
| 4000 | 2.4 | 1.7 | 1.1 | 0.8 | 1.0 | 0.9 | 1.0 | 0.4 | 15.3 | 12.8 | 10.2 | 8.9 | 8.0 | 7.6 | 8.3 | 6.0 | |
| Example 3 | |||||||||||||||||
| 50 | 200 | 11.4 | 12.3 | 12.0 | 11.5 | 18.1 | 16.6 | 13.9 | 10.0 | 33.1 | 33.5 | 34.0 | 32.9 | 38.1 | 36.8 | 34.2 | 30.8 |
| 500 | 12.7 | 15.5 | 13.0 | 13.7 | 22.9 | 21.4 | 18.2 | 13.2 | 34.9 | 37.3 | 35.4 | 35.8 | 42.4 | 41.4 | 38.9 | 35.2 | |
| 1000 | 13.2 | 17.0 | 13.5 | 14.5 | 24.6 | 23.4 | 20.6 | 14.5 | 35.7 | 39.1 | 36.1 | 36.8 | 43.8 | 42.9 | 40.9 | 36.8 | |
| 100 | 500 | 8.6 | 7.7 | 8.6 | 7.0 | 8.7 | 7.6 | 6.4 | 4.0 | 28.8 | 26.3 | 28.7 | 25.5 | 25.5 | 24.0 | 21.9 | 18.9 |
| 1000 | 9.9 | 9.5 | 9.8 | 8.5 | 10.9 | 9.7 | 7.9 | 5.2 | 30.9 | 29.2 | 30.7 | 28.1 | 28.4 | 26.9 | 24.4 | 21.7 | |
| 2000 | 10.9 | 11.8 | 10.8 | 9.9 | 13.1 | 11.9 | 10.1 | 6.8 | 32.5 | 32.5 | 32.2 | 30.5 | 31.5 | 30.0 | 27.8 | 24.9 | |
| 200 | 500 | 4.0 | 2.3 | 2.9 | 1.7 | 2.4 | 2.1 | 1.9 | 0.9 | 19.7 | 14.8 | 16.7 | 13.0 | 13.3 | 12.5 | 11.8 | 9.3 |
| 1000 | 5.1 | 3.4 | 3.9 | 2.4 | 3.2 | 2.8 | 2.5 | 1.2 | 22.3 | 17.7 | 19.4 | 15.0 | 15.3 | 14.3 | 13.6 | 10.5 | |
| 2000 | 6.2 | 4.6 | 4.9 | 3.1 | 4.1 | 3.5 | 3.1 | 1.5 | 24.4 | 20.4 | 21.7 | 17.0 | 17.0 | 15.9 | 15.0 | 11.7 | |
| 400 | 1000 | 2.0 | 1.2 | 1.0 | 0.8 | 0.9 | 0.9 | 0.8 | 0.5 | 13.9 | 10.9 | 9.7 | 8.9 | 8.3 | 8.0 | 7.7 | 6.7 |
| 2000 | 2.5 | 1.7 | 1.2 | 1.0 | 1.1 | 1.0 | 0.9 | 0.6 | 15.6 | 12.8 | 10.9 | 9.9 | 9.1 | 8.8 | 8.3 | 7.3 | |
| 4000 | 3.1 | 2.2 | 1.5 | 1.2 | 1.4 | 1.2 | 1.2 | 0.7 | 17.2 | 14.6 | 12.2 | 10.8 | 10.0 | 9.6 | 9.3 | 7.9 | |
| Example 4 | |||||||||||||||||
| 50 | 200 | 13.9 | 14.2 | 14.7 | 14.0 | 22.4 | 20.7 | 17.7 | 13.3 | 36.6 | 36.0 | 37.6 | 36.3 | 42.3 | 41.0 | 39.0 | 35.5 |
| 500 | 16.5 | 18.4 | 16.9 | 17.2 | 27.5 | 26.4 | 23.8 | 17.5 | 39.8 | 40.7 | 40.3 | 40.1 | 45.9 | 45.2 | 44.1 | 40.3 | |
| 1000 | 18.0 | 20.5 | 18.2 | 18.8 | 28.6 | 28.0 | 26.1 | 19.2 | 41.6 | 42.9 | 41.9 | 42.0 | 46.5 | 46.3 | 45.7 | 42.3 | |
| 100 | 500 | 12.4 | 10.8 | 12.4 | 10.3 | 13.7 | 12.4 | 10.2 | 7.8 | 34.5 | 31.1 | 34.5 | 31.0 | 33.4 | 31.9 | 29.2 | 27.2 |
| 1000 | 14.6 | 13.4 | 14.6 | 12.7 | 18.3 | 16.8 | 14.0 | 10.8 | 37.5 | 34.8 | 37.5 | 34.5 | 38.7 | 37.3 | 34.6 | 32.0 | |
| 2000 | 16.6 | 16.5 | 16.6 | 15.4 | 22.1 | 20.6 | 17.9 | 14.2 | 40.0 | 38.5 | 40.0 | 38.0 | 42.1 | 41.0 | 39.1 | 36.7 | |
| 200 | 500 | 7.0 | 4.7 | 5.9 | 4.1 | 5.1 | 4.7 | 3.6 | 2.9 | 26.0 | 20.8 | 23.8 | 19.9 | 21.3 | 20.5 | 17.7 | 16.9 |
| 1000 | 9.4 | 6.8 | 8.1 | 5.7 | 7.3 | 6.6 | 5.1 | 4.1 | 30.1 | 24.8 | 27.9 | 23.2 | 25.1 | 24.2 | 21.1 | 20.0 | |
| 2000 | 11.6 | 8.9 | 10.3 | 7.4 | 9.4 | 8.7 | 6.8 | 5.5 | 33.4 | 28.2 | 31.4 | 26.4 | 28.4 | 27.5 | 24.3 | 23.1 | |
| 400 | 1000 | 5.2 | 3.9 | 3.7 | 3.3 | 3.5 | 3.3 | 2.4 | 2.7 | 22.3 | 19.2 | 18.8 | 17.7 | 18.1 | 17.8 | 15.2 | 16.2 |
| 2000 | 6.8 | 5.5 | 4.9 | 4.3 | 4.6 | 4.4 | 3.2 | 3.6 | 25.5 | 22.7 | 21.6 | 20.3 | 20.7 | 20.3 | 17.4 | 18.7 | |
| 4000 | 8.5 | 7.0 | 6.2 | 5.4 | 5.9 | 5.6 | 4.2 | 4.5 | 28.5 | 25.4 | 24.4 | 22.6 | 23.4 | 23.0 | 19.6 | 20.9 | |
Table A2.
The mean of MSE () and RMSE () for estimated Pcors of real with , , and in Examples 1–4.
Table A2.
The mean of MSE () and RMSE () for estimated Pcors of real with , , and in Examples 1–4.
| n | p | MSE () | RMSE () | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | Lasso | MCP | Lasso | MCP | |||||||||||||
| Example 1 | Res | Reg2 | Res | Reg2 | Coef | Var | RSS2 | MRSS | Res | Reg2 | Res | Reg2 | Coef | Var | RSS2 | MRSS | |
| 50 | 200 | 68.5 | 36.8 | 70.4 | 55.2 | 87.9 | 96.0 | 61.5 | 57.4 | 80.9 | 58.2 | 81.9 | 72.4 | 92.9 | 97.1 | 76.3 | 74.3 |
| 500 | 91.0 | 49.5 | 91.2 | 71.9 | 86.8 | 93.2 | 77.3 | 73.7 | 93.4 | 67.7 | 93.4 | 82.7 | 92.3 | 95.7 | 86.2 | 84.4 | |
| 1000 | 100.9 | 56.4 | 98.8 | 78.3 | 83.5 | 89.3 | 83.0 | 79.0 | 98.3 | 72.5 | 97.3 | 86.3 | 90.5 | 93.6 | 89.6 | 87.2 | |
| 100 | 500 | 39.4 | 21.4 | 18.2 | 14.2 | 91.1 | 103.2 | 30.5 | 14.4 | 61.3 | 44.1 | 41.5 | 36.6 | 94.9 | 100.6 | 49.9 | 36.8 |
| 1000 | 54.4 | 28.3 | 30.9 | 22.9 | 97.9 | 104.7 | 35.7 | 23.3 | 72.1 | 50.6 | 54.1 | 46.4 | 98.1 | 101.3 | 54.6 | 46.9 | |
| 2000 | 69.6 | 34.6 | 47.9 | 34.4 | 99.8 | 104.6 | 41.9 | 34.8 | 81.5 | 55.9 | 67.4 | 56.9 | 98.9 | 101.1 | 60.8 | 57.5 | |
| 200 | 500 | 10.6 | 5.3 | 1.7 | 1.7 | 7.1 | 13.2 | 2.8 | 0.8 | 31.8 | 22.5 | 13.0 | 12.8 | 21.9 | 28.4 | 14.1 | 8.2 |
| 1000 | 16.4 | 9.4 | 2.5 | 2.4 | 20.9 | 32.3 | 4.5 | 1.3 | 39.5 | 29.6 | 15.6 | 15.4 | 40.8 | 49.8 | 18.2 | 10.5 | |
| 2000 | 23.4 | 14.3 | 3.7 | 3.7 | 43.6 | 56.7 | 8.4 | 2.4 | 47.3 | 36.5 | 18.8 | 18.7 | 63.4 | 72.4 | 25.9 | 14.9 | |
| 400 | 1000 | 4.4 | 2.1 | 0.5 | 0.5 | 0.9 | 0.8 | 0.9 | 0.2 | 20.5 | 14.2 | 6.9 | 6.9 | 7.3 | 6.9 | 7.8 | 4.5 |
| 2000 | 6.4 | 3.7 | 0.6 | 0.7 | 1.2 | 1.2 | 1.2 | 0.3 | 24.8 | 19.0 | 7.8 | 8.0 | 8.3 | 8.6 | 8.8 | 4.9 | |
| 4000 | 9.0 | 6.0 | 0.8 | 0.9 | 1.8 | 3.2 | 1.5 | 0.4 | 29.3 | 24.2 | 9.0 | 9.4 | 10.3 | 13.4 | 9.8 | 5.6 | |
| Example 2 | |||||||||||||||||
| 50 | 200 | 68.6 | 37.0 | 70.5 | 56.0 | 88.1 | 95.9 | 61.4 | 58.1 | 80.9 | 58.4 | 82.0 | 73.0 | 93.0 | 97.1 | 76.3 | 74.9 |
| 500 | 91.5 | 49.9 | 91.3 | 71.9 | 87.3 | 94.4 | 77.6 | 73.9 | 93.6 | 68.2 | 93.4 | 82.7 | 92.5 | 96.3 | 86.3 | 84.5 | |
| 1000 | 101.0 | 56.9 | 99.2 | 78.2 | 83.5 | 89.4 | 83.5 | 78.9 | 98.4 | 72.7 | 97.5 | 86.3 | 90.4 | 93.7 | 89.9 | 87.2 | |
| 100 | 500 | 39.9 | 21.7 | 18.8 | 14.7 | 92.2 | 103.2 | 30.7 | 14.8 | 61.7 | 44.4 | 42.2 | 37.2 | 95.4 | 100.6 | 50.1 | 37.4 |
| 1000 | 55.0 | 28.7 | 31.9 | 23.6 | 98.0 | 104.9 | 36.0 | 23.8 | 72.4 | 51.0 | 55.0 | 47.1 | 98.1 | 101.3 | 54.9 | 47.5 | |
| 2000 | 70.0 | 34.8 | 48.8 | 35.2 | 99.8 | 104.8 | 42.4 | 35.8 | 81.8 | 56.0 | 68.1 | 57.6 | 98.9 | 101.2 | 61.2 | 58.3 | |
| 200 | 500 | 11.0 | 5.6 | 1.9 | 1.9 | 7.6 | 13.8 | 3.0 | 0.9 | 32.4 | 23.1 | 13.8 | 13.5 | 22.8 | 29.3 | 14.5 | 8.7 |
| 1000 | 16.8 | 9.8 | 2.8 | 2.7 | 22.1 | 32.7 | 4.9 | 1.4 | 40.0 | 30.2 | 16.4 | 16.2 | 42.5 | 50.6 | 19.3 | 11.3 | |
| 2000 | 23.9 | 14.9 | 4.0 | 4.0 | 44.7 | 57.8 | 8.8 | 2.7 | 47.8 | 37.2 | 19.6 | 19.5 | 64.4 | 73.0 | 26.6 | 16.1 | |
| 400 | 1000 | 4.7 | 2.3 | 0.6 | 0.6 | 1.1 | 0.9 | 1.0 | 0.3 | 21.2 | 14.9 | 7.6 | 7.7 | 7.9 | 7.6 | 8.1 | 5.0 |
| 2000 | 6.8 | 4.0 | 0.8 | 0.8 | 1.3 | 1.4 | 1.3 | 0.4 | 25.4 | 19.8 | 8.7 | 8.8 | 8.8 | 9.2 | 9.1 | 5.5 | |
| 4000 | 9.4 | 6.5 | 1.0 | 1.1 | 1.9 | 3.5 | 1.6 | 0.5 | 30.0 | 25.0 | 9.8 | 10.2 | 10.9 | 14.2 | 10.2 | 6.2 | |
| Example 3 | |||||||||||||||||
| 50 | 200 | 70.3 | 38.1 | 72.5 | 57.6 | 89.4 | 97.2 | 63.2 | 60.2 | 81.9 | 59.3 | 83.1 | 74.0 | 93.7 | 97.7 | 77.5 | 76.2 |
| 500 | 93.1 | 51.3 | 92.8 | 73.4 | 88.4 | 95.0 | 79.0 | 75.6 | 94.4 | 69.1 | 94.2 | 83.6 | 93.1 | 96.6 | 87.1 | 85.5 | |
| 1000 | 102.5 | 58.2 | 100.7 | 80.0 | 85.1 | 91.0 | 85.3 | 80.3 | 99.1 | 73.7 | 98.2 | 87.3 | 91.3 | 94.5 | 90.7 | 87.8 | |
| 100 | 500 | 42.4 | 23.2 | 21.9 | 17.3 | 94.4 | 105.3 | 31.9 | 17.5 | 63.6 | 46.0 | 45.5 | 40.4 | 96.5 | 101.6 | 51.1 | 40.7 |
| 1000 | 57.6 | 30.2 | 35.7 | 26.8 | 100.1 | 106.7 | 37.0 | 27.2 | 74.1 | 52.2 | 58.2 | 50.3 | 99.1 | 102.1 | 55.8 | 50.8 | |
| 2000 | 72.2 | 36.0 | 52.3 | 37.9 | 101.8 | 106.5 | 43.7 | 38.5 | 83.1 | 57.1 | 70.6 | 59.8 | 99.8 | 102.0 | 62.3 | 60.6 | |
| 200 | 500 | 13.0 | 7.3 | 3.3 | 3.2 | 10.4 | 17.5 | 4.0 | 1.7 | 35.3 | 26.2 | 17.7 | 17.4 | 28.7 | 35.5 | 17.6 | 12.8 |
| 1000 | 19.3 | 11.9 | 4.4 | 4.3 | 26.0 | 37.6 | 6.3 | 2.6 | 42.9 | 33.3 | 20.5 | 20.3 | 47.3 | 55.9 | 22.2 | 15.6 | |
| 2000 | 26.9 | 17.1 | 5.9 | 5.9 | 49.8 | 64.1 | 10.8 | 4.3 | 50.7 | 39.8 | 23.9 | 23.7 | 68.6 | 77.7 | 29.5 | 20.3 | |
| 400 | 1000 | 6.3 | 3.7 | 1.4 | 1.5 | 1.8 | 1.7 | 1.6 | 0.8 | 24.6 | 19.0 | 11.9 | 11.9 | 12.1 | 11.7 | 11.1 | 9.0 |
| 2000 | 8.8 | 6.0 | 1.8 | 1.8 | 2.2 | 2.3 | 1.9 | 1.0 | 29.1 | 24.0 | 13.0 | 13.2 | 13.3 | 14.0 | 12.4 | 9.8 | |
| 4000 | 11.8 | 8.8 | 2.1 | 2.2 | 3.3 | 5.2 | 2.4 | 1.2 | 33.6 | 29.1 | 14.3 | 14.7 | 16.1 | 19.6 | 13.6 | 10.7 | |
| Example 4 | |||||||||||||||||
| 50 | 200 | 73.4 | 40.2 | 76.7 | 61.4 | 92.4 | 99.9 | 66.4 | 64.5 | 83.8 | 61.0 | 85.5 | 76.4 | 95.2 | 99.0 | 79.5 | 78.9 |
| 500 | 96.6 | 53.6 | 96.6 | 77.3 | 92.5 | 99.2 | 82.7 | 80.1 | 96.3 | 70.8 | 96.1 | 85.8 | 95.2 | 98.6 | 89.1 | 88.0 | |
| 1000 | 106.6 | 61.8 | 105.8 | 85.4 | 90.4 | 96.4 | 91.1 | 86.4 | 101.1 | 76.0 | 100.7 | 90.2 | 94.1 | 97.2 | 93.8 | 91.2 | |
| 100 | 500 | 49.2 | 27.9 | 31.0 | 25.1 | 102.1 | 111.1 | 34.7 | 25.4 | 68.5 | 50.5 | 54.2 | 48.7 | 100.2 | 104.2 | 53.7 | 49.1 |
| 1000 | 66.0 | 35.5 | 48.4 | 37.1 | 106.9 | 112.1 | 41.0 | 37.8 | 79.4 | 56.8 | 67.8 | 59.3 | 102.3 | 104.6 | 59.8 | 60.0 | |
| 2000 | 81.7 | 41.8 | 66.8 | 49.8 | 109.2 | 112.9 | 50.4 | 50.8 | 88.5 | 61.9 | 79.9 | 68.7 | 103.2 | 104.9 | 67.8 | 69.7 | |
| 200 | 500 | 18.9 | 12.2 | 8.3 | 8.1 | 20.8 | 29.7 | 8.9 | 6.3 | 42.4 | 33.8 | 28.2 | 27.8 | 44.0 | 51.6 | 27.8 | 24.8 |
| 1000 | 28.3 | 19.4 | 11.7 | 11.5 | 45.0 | 59.2 | 14.7 | 9.9 | 51.9 | 42.4 | 33.5 | 33.0 | 66.1 | 75.3 | 35.1 | 31.0 | |
| 2000 | 38.8 | 26.4 | 16.2 | 15.8 | 76.1 | 92.8 | 23.6 | 15.2 | 60.9 | 49.3 | 39.3 | 38.7 | 87.0 | 95.7 | 43.2 | 38.1 | |
| 400 | 1000 | 13.2 | 10.3 | 6.9 | 6.9 | 8.0 | 7.8 | 6.4 | 6.0 | 35.5 | 31.2 | 25.7 | 25.8 | 27.5 | 27.5 | 24.9 | 24.3 |
| 2000 | 18.8 | 15.7 | 9.2 | 9.4 | 11.6 | 12.5 | 9.7 | 8.2 | 42.4 | 38.5 | 29.7 | 29.9 | 33.4 | 34.9 | 30.6 | 28.4 | |
| 4000 | 24.9 | 21.4 | 11.7 | 12.0 | 18.3 | 22.4 | 12.3 | 10.8 | 48.8 | 44.9 | 33.6 | 34.0 | 42.0 | 46.4 | 34.0 | 32.5 | |
References
- Tabachnick; Barbara, G.; Linda, S.F.; Jodie, B.U. Using Multivariate Statistics, 6th ed.; Pearson: Boston, MA, USA, 2013. [Google Scholar]
- Huang, Y.; Chang, X.; Zhang, Y.; Chen, L.; Liu, X. Disease characterization using a partial correlation-based sample-specific network. Brief. Bioinform. 2021, 22, bbaa062. [Google Scholar] [CrossRef]
- Peng, J.; Wang, P.; Zhou, N.; Zhu, J. Partial correlation estimation by joint sparse regression models. J. Am. Stat. Assoc. 2009, 104, 735–746. [Google Scholar] [CrossRef] [PubMed]
- De La Fuente, A.; Bing, N.; Hoeschele, I.; Mendes, P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 2004, 20, 3565–3574. [Google Scholar] [CrossRef] [PubMed]
- Marrelec, G.; Kim, J.; Doyon, J.; Horwitz, B. Large-scale neural model validation of partial correlation analysis for effective connectivity investigation in functional MRI. Hum. Brain Mapp. 2009, 30, 941–950. [Google Scholar] [CrossRef] [PubMed]
- Wang, G.J.; Xie, C.; Stanley, H.E. Correlation structure and evolution of world stock markets: Evidence from Pearson and partial correlation-based networks. Comput. Econ. 2018, 51, 607–635. [Google Scholar] [CrossRef]
- Kenett, D.Y.; Tumminello, M.; Madi, A.; Gur-Gershgoren, G.; Mantegna, R.N.; Ben-Jacob, E. Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market. PLoS ONE 2010, 5, e15032. [Google Scholar] [CrossRef]
- Kenett, D.Y.; Huang, X.; Vodenska, I.; Havlin, S.; Stanley, H.E. Partial correlation analysis: Applications for financial markets. Quant. Finance 2015, 15, 569–578. [Google Scholar] [CrossRef]
- Michis, A.A. Multiscale partial correlation clustering of stock market returns. J. Risk Financ. Manag. 2022, 15, 24. [Google Scholar] [CrossRef]
- Singh, V.; Li, B.; Roca, E. Global and regional linkages across market cycles: Evidence from partial correlations in a network framework. Appl. Econ. 2019, 51, 3551–3582. [Google Scholar] [CrossRef]
- Epskamp, S.; Fried, E.I. A tutorial on regularized partial correlation networks. Psychol. Methods 2018, 23, 617–634. [Google Scholar] [CrossRef]
- Williams, D.R.; Rast, P. Back to the basics: Rethinking partial correlation network methodology. Brit. J. Math. Stat. Psy. 2020, 73, 187–212. [Google Scholar] [CrossRef] [PubMed]
- Waldorp, L.; Marsman, M. Relations between networks, regression, partial correlation, and the latent variable model. Multivariate Behav. Res. 2022, 57, 994–1006. [Google Scholar] [CrossRef]
- Gvozdarev, A.; Parovik, R. On the relationship between the fractal dimension of geomagnetic variations at Altay and the space weather characteristics. Mathematics 2023, 11, 3449. [Google Scholar] [CrossRef]
- Khare, K.; Oh, S.Y.; Rajaratnam, B. A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. J. R. Stat. Soc. B 2015, 77, 803–825. [Google Scholar] [CrossRef]
- Kim, S. ppcor: An R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 2015, 22, 665. [Google Scholar] [CrossRef] [PubMed]
- Huang, Z.; Deb, N.; Sen, B. Kernel partial correlation coefficient—A measure of conditional dependence. J. Mach. Learn. Res. 2022, 23, 9699–9756. [Google Scholar]
- Van Aert, R.C.; Goos, C. A critical reflection on computing the sampling variance of the partial correlation coefficient. Res. Synth. Methods 2023, 14, 520–525. [Google Scholar] [CrossRef]
- Hu, H.; Qiu, Y. Inference for nonparanormal partial correlation via regularized rank based nodewise regression. Biometrics 2023, 79, 1173–1186. [Google Scholar] [CrossRef]
- Cox, D.R.; Wermuth, N. Multivariate Dependencies–Models, Analysis and Interpretation; Chapman and Hall: London, UK, 1996. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Owen, A.B. A robust hybrid of lasso and ridge regression. Contemp. Math. 2007, 443, 59–72. [Google Scholar]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B. 2005, 67, 91–108. [Google Scholar] [CrossRef]
- Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [PubMed]
- Wang, H. Coordinate descent algorithm for covariance graphical lasso. Stat. Comput. 2014, 24, 521–529. [Google Scholar] [CrossRef]
- Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrix estimation using a factor model. JoE. 2008, 147, 186–197. [Google Scholar] [CrossRef]
- Elton, E.J.; Gruber, M.J.; Brown, S.J.; Goetzmann, W.N. Modern Portfolio Theory and Investment Analysis; John Wiley and Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).