Next Article in Journal
Continuity Correction and Standard Error Calculation for Testing in Proportional Hazards Models
Previous Article in Journal
The Detection Method of the Tobit Model in a Dataset
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Some Useful Techniques for High-Dimensional Statistics

School of Mathematical & Statistical Sciences, Southern Illinois University, Carbondale, IL 62901-4408, USA
Stats 2025, 8(3), 60; https://doi.org/10.3390/stats8030060
Submission received: 23 May 2025 / Revised: 29 June 2025 / Accepted: 6 July 2025 / Published: 13 July 2025

Abstract

High-dimensional statistics are used when n < 5 p , where n is the sample size and p is the number of predictors. Useful techniques include (a) the use of a sparse fitted model, (b) use of principal component analysis for dimension reduction, (c) use of alternative multivariate dispersion estimators instead of the sample covariance matrix, (d) eliminate weak predictors, and (e) stack low-dimensional estimators into a vector. Some variants and theory for these techniques will be given or reviewed.

1. Introduction

High-dimensional statistics are used when n < 5 p , where n is the sample size and p is the number of variables. Such a model is overfitting: the model does not have enough data to estimate the p parameters accurately. Then n tends to not be large enough for the classical statistical method to be useful. An alternative (but less general) definition of high-dimensional statistics is that p is large. Sometimes p > K n with K 10 is called ultrahigh-dimensional statistics.
Some important statistical methods include regression, multivariate statistics, and classification. These methods are important for statistical learning ≈ machine learning, an important part of artificial intelligence. Let predictor variables for regression or multivariate statistics be x = ( x 1 , , x p ) T . Let Y be a response variable for regression or classification. Important regression models include generalized linear models, nonlinear regression, nonparametric regression, and survival regression models. Inference for multivariate regression where there are m response variables Y 1 , , Y m is also of interest. Useful references for the following statistical methods include [1,2,3].
Let the population covariance matrices
Cov ( x ) = E [ ( x E ( x ) ) ( x E ( x ) ) T ] = Σ x , and
Cov ( x , Y ) = E [ ( x E ( x ) ) ( Y E ( Y ) ) ] = Σ x Y .
Let the sample covariance matrices be
Σ ^ x = 1 n 1 i = 1 n ( x i x ¯ ) ( x i x ¯ ) T and Σ ^ x Y = 1 n 1 i = 1 n ( x i x ¯ ) ( Y i Y ¯ ) .
Let the population correlation ρ i j = ρ x i , x j = Cor ( x i , x j ) and the sample correlation r i j = r x i , x j = cor ( x i , x j ) . Let the population correlation matrices Cor ( x ) = ρ x = ( ρ i j ) and Cor ( x , Y ) = ρ x Y = ( ρ x 1 , Y , , ρ x p , Y ) T . Let the sample covariance matrices be R x = ( r i j ) and r x Y = ( r x 1 , Y , , r x p , Y ) T . Then Σ ^ x and R are dispersion estimators, and ( x ¯ , Σ ^ x ) is an estimator of multivariate location and dispersion.
Suppose the positive semi-definite dispersion matrix Σ has eigenvalue eigenvector pairs ( λ 1 , d 1 ) , , ( λ p , d p ) , where λ 1 λ 2 λ p . Let the eigenvalue eigenvector pairs of Σ ^ be ( λ ^ 1 , d ^ 1 ) , , ( λ ^ p , d ^ p ) , where λ ^ 1 λ ^ 2 λ ^ p . These vectors are important quantities for principal component analysis (PCA).
Let the multiple linear regression model
Y i = α + x i , 1 β 1 + + x i , p β p + e i = α + x i T β + e i
for i = 1 , , n . In matrix form, this model is Y = X δ + e , where Y is an n × 1 vector of dependent variables, X is an n × ( p + 1 ) matrix of predictors, δ = ( α , β T ) T is a ( p + 1 ) × 1 vector of unknown coefficients, and e is an n × 1 vector of unknown errors. Assume that the e i are independent and identically distributed (iid) with expected value E ( e i ) = 0 and variance V ( e i ) = σ 2 .
Principal components regression (PCR), partial least squares (PLS), and several other dimension reduction models use p linear combinations γ 1 T x , , γ p T x . Estimating the γ i and performing the ordinary least squares (OLS) regression of Y on ( γ ^ 1 T x , γ ^ 2 T x , , γ ^ k T x ) and a constant gives the k-component estimator, e.g., the k-component PLS estimator or the k-component PCR estimator, for k = 1 , , J where J p and the p-component estimator is the OLS estimator β ^ O L S . Let γ i ( P C R ) = d i and γ i = γ i ( P L S ) . The model selection estimator chooses one of the k-component estimators, e.g., using cross validation, and it will be denoted by β ^ M S P L S or β ^ M S P C R .
Let X = [ 1 X 1 ] . Ref.  [4] noted that one way to formulate PLS is to solve an optimization problem by forming b j = γ ^ j iteratively where
b k = arg max b { [ Cor ( Y , X 1 b ) ] 2 V ( X 1 b ) }
subject to b T b = 1 and b T Σ x b j = 0 for j = 1 , , k 1 . Here V stands for the variance. So PLS is a model free way to get predictors γ ^ i T x that are fairly highly correlated with the response, and the absolute correlations tend to decrease quickly.
In high dimensions, it is very difficult to estimate a p × 1 vector θ , e.g., θ = β or θ = Σ x Y . This result is a form of “the curse of dimensionality”. If a n consistent estimator of θ is available, then the squared norm
θ ^ θ 2 = i = 1 p ( θ ^ i θ i ) 2 p / n .
Since θ ^ is a consistent estimator of θ if θ ^ θ 2 0 as n , often the high-dimensional estimator θ ^ has not shown to be consistent, except under very strong regularity conditions.
Although this paper reviews some important techniques for high-dimensional statistics, there is also new material. First, for multiple linear regression, Section 2.1 proves that “sensible model selection estimators” produce fitted values similar to that of the full OLS model when n is much larger than p, even if heterogeneity is present. Second, Section 2.3 shows how to improve PCR and gives a new explanation for why PLS components should be used instead of PCR components from PCA. Third, Section 2.3 provides a relationship between canonical correlation analysis and PLS. Fourth, Section 2.3 introduces the SC scree plot. Fifth, Section 2.5 suggests a simple method for choosing δ for a regularized correlation matrix R δ . The outlier-resistant methods of Section 2.5 are not new, but they have never been used for most high-dimensional procedures.

2. Materials and Methods

2.1. Model Selection Estimators in Low Dimensions

This subsection explains why “sensible model selection estimators, including variable selection estimators,” produce fitted values (predictions) similar to that of the full OLS model when n is much larger than p. The result in Equation (4) that the residuals from the model selection model and the full OLS model are highly correlated was a property of OLS and Mallow’s C p criterion, not of any underlying model, but linearity forces the fitted values to be highly correlated. Hence the result works if OLS is consistent and the population model is linear, so for weighted least squares, AR(p) time series, serially correlated errors, et cetera. In particular, the cases do not need to be iid from some distribution. Since the correlation gets arbitrarily close to 1, the model selection estimator and full OLS estimator are estimating the same population parameter β , but it is possible that the model selection estimator picks the full OLS model with probability going to one.
Consider the OLS regression of Y on a constant and w = ( W 1 , , W p ) T where, for example, W j = x j , W j = γ ^ j T x , or W j = d ^ j T x . Let I index the variables in the model so I = { 1 , 2 , 4 } means that w I = ( W 1 , W 2 , W 4 ) T was selected. The full model I = F uses all p predictors and the constant with β I = β F = β = β O L S . Let r be the residuals from the full OLS model and let r I be the residuals from model I that uses β ^ I . Suppose model I uses k predictors including a constant with 2 k p + 1 . Ref.  [5] proved that the model I with k predictors that minimizes [6] C p ( I ) maximizes cor( r , r I ) , that
cor ( r , r I ) = n ( p + 1 ) C p ( I ) + n 2 k ,
and under linearity, cor( r , r I ) 1 forces
c o r ( α ^ + w T β ^ , α ^ I + w I T β ^ I ) = cor ( ESP , ESP ( I ) ) = cor ( Y ^ , Y ^ I ) 1 .
Thus C p ( I ) 2 k implies that
cor ( r , r I ) 1 p + 1 n .
Let the model I m i n minimize the C p criterion among the models considered with C p ( I ) 2 k I . Then C p ( I m i n ) C p ( F ) = p + 1 , and if PLS or PCR is selected using model selection (on models I 1 , , I p with I j = { 1 , 2 , , j } corresponding to the j-component regression) with the C p ( I ) criterion, and n 20 ( p + 1 ) , then cor( r , r I ) 19 / 20 = 0.974 . Hence the correlation of ESP(I) and ESP(F) will typically also be high. (For PCR, the following variant should work better, as follows: take U j = d ^ j T x and W 1 as the U j with the highest squared correlation with Y, W 2 as the U j with the second highest squared correlation, etc.).
Machine learning methods for the multiple linear regression model can be incorporated as follows. Let k be the number of predictors selected by lasso, including a constant. Standardize the predictors to have unit sample variance and run the method. Let model I contain the variables corresponding to the k 1 predictors variables that have the largest | β ^ i | . Fit the OLS model I to these predictors and a constant. If C p ( I ) < m i n ( 2 k , p + 1 ) , use model I; otherwise, use the full OLS model. Many variants are possible. In low dimensions, comparisons between methods like lasso, PCR, PLS, and envelopes might use prediction intervals, the amount of dimension reduction, and standard errors if available.
If the above procedure is used, then model selection estimators, such as β ^ M S P L S , produce predictions that are similar to those of the OLS full model if n 20 ( p + 1 ) . Other model selection criterion, such as k-fold cross validation, tend to behave like C p in low dimensions, but getting bounds like Equation (4) may be difficult. Empirically, variable selection estimators and model selection estimators often do not select the full model. Equation (4) suggests that “weak” predictors will often be omitted, as long as cor( r , r I ) stays high. (If the predictors are not orthogonal, “weak” might mean the predictor is not very useful given that the other predictors are in the model).
It is common in the model selection literature to assume, for the full model, that there is a model S such that β i 0 for i S , and β i = 0 for i S . Then model I underfits unless S I . If S I , then an “important” predictor has been left out of the model. Then under the model x T β = x S T β S , cor ( r , r I ) will not converge to 1 as n , and for large enough n, [ cor ( r , r I ) ] 2 γ < 1 . Thus C p ( I ) as n . Hence P ( S I m i n ) 1 as n where I m i n corresponds to the set of predictors selected by a variable selection method such as forward selection or lasso variable selection. Thus the probability that the model selection estimator underfits goes to zero as n if p is fixed, the full model is one of the models considered, and the C p criterion is used, as noted by [7].
For real data, an important question in variable selection is whether β i = 0 is a reasonable assumption. If X has full rank p + 1 , then having β i equal to zero for 20 decimal places may not be reasonable. See, for example, [8,9,10]. Then the probability that the variable selection estimator chooses the full model goes to one if the probability of underfitting goes to 0 as n . If β ^ I is a × 1 , use zero padding to form the p × 1 vector β ^ I , 0 from β ^ I by adding 0s corresponding to the omitted variables.
For example, if p = 3 and Y = α + x 1 + 0 x 2 + 0 x 3 + e , then S = { 1 } and β S , 0 = ( 1 , 0 , 0 ) T = β = ( β 1 , β 2 , β 3 ) T = β F . This population model has a = a S = 1 active predictors. Then the J = 2 p = 8 possible subsets of F = { 1 , 2 , , p } are I 1 = , S = I 2 = { 1 } , I 3 = { 2 } , I 4 = { 3 } , I 5 = { 1 , 2 } , I 6 = { 1 , 3 } , I 7 = { 2 , 3 } , and F = I 8 = { 1 , 2 , 3 } . There are 2 p a S = 4 subsets I 2 , I 5 , I 6 , and I 8 such that S I j . Let β ^ I 7 = ( β ^ 2 , β ^ 3 ) T and x I 7 = ( x 2 , x 3 ) T . If β ^ I m i n = ( β ^ 1 , β ^ 3 ) T , then the observed variable selection estimator β ^ V S = β ^ I m i n , 0 = ( β ^ 1 , 0 , β ^ 3 ) T . As a statistic, β ^ V S = β ^ I k , 0 with probabilities π k n = P ( I m i n = I k ) for k = 1 , , J where there are J subsets, e.g., J = 2 p 1 . Theory for the variable selection estimator β ^ V S is complicated. See [7] for models such as multiple linear regression, GLMs, and [11] Cox proportional hazards regression.

2.2. Sparse Fitted Models

A fitted or population regression model is sparse if a of the predictors are active (have nonzero β ^ i or β i ) where n J a with J 10 . Otherwise, the model is nonsparse. A high-dimensional population full regression model is abundant or dense if the regression information is spread out among the p predictors (nearly all of the predictors are active). Hence an abundant model is a nonsparse model. Under the above definitions, most classical low-dimensional models use sparse fitted models, and statisticians have over one hundred years of experience with such models.
The literature for high-dimensional sparse regression models often assumes that (i)  β I , 0 = β = β F , that (ii) S I where I uses k predictors including a constant, and that (iii) n 10 k . When these assumptions hold, the population model is sparse, the fitted model is sparse, and Equation (3) becomes β ^ I , 0 β 2 , which can be small. Getting rid of assumption (i) and the assumption that S I greatly increases the applicability of variable selection estimators, such as forward selection, lasso, and the elastic net, for high-dimensional data, even if β ^ I , 0 β 2 is huge. As argued in the following paragraphs, the sparse fitted model often fits the data well, and often β ^ I is a good estimator of β I .
A sparse fitted model transforms a high-dimensional problem into a low-dimensional problem, and the sparse fitted model can be checked with the goodness of fit diagnostics available for that low-dimensional model. If the predictors used by the sparse fitted regression model are x I , and if the regression model depends on x I only through the sufficient predictor S P = α I + x I T β I , then a useful diagnostic is the response plot of E S P ( I ) = α ^ I + x I T β ^ I versus the response Y on the vertical axis. If there is goodness of fit, then β ^ I tends to estimate β I regardless of whether the population model is sparse or nonsparse. Data splitting may be needed for valid inference such as hypothesis testing.
Suppose the cases ( x i T , Y i ) T are iid for i = 1 , , n . Then Y 1 , , Y n are iid, resulting in a valid sparse fitted model regardless of whether the population model is sparse or nonsparse. This null model omits all of the predictors. For high-dimensional data, a reasonable goal is to find a model that greatly outperforms the null model.
The sparse fitted model using ( Y , x I ) is often useful when there are one or more strong predictors. The following [12] theorem gives two more situations where a sparse fitted model can greatly outperform the null model. The population models in Theorem 1 can be sparse or nonsparse. The high-dimensional multiple linear regression literature often assumes that the cases are iid from a multivariate normal distribution, and that the population model is sparse. Let Σ Y = σ Y 2 . For multiple linear regression, note that σ O 2 < σ Y 2 = Σ Y unless η T Σ x Y = 0 .
Theorem 1.
Suppose the cases ( Y i , x i T ) T are iid from some distribution.
(a) If the joint distribution of ( Y , x T ) T is multivariate normal,
Y x N p + 1 μ Y μ x , Σ Y Σ Y x Σ x Y Σ x ,
then Y | x Y | ( α O L S + β O L S T x ) N ( α O L S + β O L S T x , σ 2 ) follows a multiple linear regression model, but so does Y | η T x N ( α O + β O T x , σ O 2 ) , where α O = μ Y β O T μ x , β O = λ η , σ O 2 = Σ Y β O T Σ x Y , and
λ = Σ x Y T η η T Σ x η .
(b) If the response Y is binary, then Y | ( α O + β O T x ) binomial( m = 1 , ρ ( α O + β O T x ) ) where E [ Y | ( α O + β O T x ) ] = ρ ( α O + β O T x ) = P [ Y = 1 | ( α O + β O T x ) ] . Hence every linear combination of the predictors satisfies a binary regression model.

2.3. PCA-PLS

Another technique is to use PCA for dimension reduction. Let U 1 , , U p be the PCA linear combinations ( U i = γ ^ i T x ) ordered with respect to the largest eigenvalues. Then use U 1 , , U k in the regression or classification model where k is chosen in some manner. This method can be used for models with m response variables Y 1 , , Y m . See, for example, [13,14,15,16].
Consider a low or high-dimensional regression or classification method with a univariate response variable Y. Let W 1 , , W p be the linear combinations ordered with respect to the highest squared correlations r 1 2 r 2 2 r p 2 where the sample correlation r i , Y = cor ( x i , Y ) . From a model selection viewpoint, using W 1 , , W k should work much better than using U 1 , , U k . Also, the PLS components W i should be used instead of the PCA W i , since the PLS components are chosen to be fairly highly correlated with Y. See Equation (2). Ref.  [17] (pp. 71–72) shows that an equivalent way to compute the k-component PLS estimator is to maximize γ ^ T Σ ^ x Y under some constraints. If the predictors are standardized to have unit sample variance, then this method becomes a correlation vector optimization problem. Ref.  [18] use the PLS components as predictors for nonlinear regression, but the above model selection viewpoint is new.
From canonical correlation analysis (CCA), if ( Y i , x i T ) T are iid, then
M = max γ 0 Cor ( γ T x , Y ) = max γ 0 γ T Σ x Y Σ Y γ T Σ x γ .
This optimization problem is equivalent to maximizing
Σ Y M 2 = max γ 0 γ T Σ x Y Σ x Y T γ γ T Σ x γ ,
which has a maximum at γ = Σ x 1 Σ x Y = β O L S . See [19] (pp. 168, 282). Hence PLS is a lot like CCA for ( Y i , x i T ) T but with more constraints, and PLS can be computed in high dimensions. From the dimension reduction literature, if Y depends on x only through α + β T x , then under the assumption of “linearly related predictors,” β ^ O L S estimates β O L S = c β for some constant c which is often nonzero. See, for example, [20] (p. 432).
The above results suggest computing lasso for multiple linear regression, find the number of predictors k chosen by lasso, and take k linear combinations W i . An SC scree plot of i versus r i 2 behaves like a scree plot of i versus the eigenvalues. Hence quantities like i = 1 j r i 2 / i = 1 p r i 2 are of interest for j = 1 , , p , and scree plot techniques could be adapted to choose k. Many other possibilities exist, and there are many possibilities for models with m response variables Y 1 , , Y m . See [18] for some ideas.
Another useful technique is to eliminate weak predictors before finding W 1 , , W k . By Equation (3), γ ^ i may not be close to γ i in high dimensions, e.g., p = n 6 . For example, the sample eigenvectors d ^ i tend to be poor estimators of the population eigenvectors d i of Σ x . An exception is when the correlation Cor ( x i , x j ) = ρ for i j where ρ is close to 1. See [21]. One possibility is to take the j predictors x i with the highest squared correlations with Y. The SC scree plot is useful. Then do lasso (meant for the multiple linear regression model) to further reduce the number of x i . Here j should be proportional to n, for example, j = min ( K n , p ) , where K = 1 is an interesting choice. When n is small, spurious correlations can be a problem as follows: if the actual correlation is near 0, the sample size n may need to be large before the sample correlation r i is near 0. For more on the importance of eliminating weak predictors and high-dimensional variable selection, see, for example, [22,23,24,25,26].

2.4. Stack Low-Dimensional Estimators into a Vector

Another technique is to stack low-dimensional estimators into a vector. For example x ¯ 2 x ¯ 1 , and elements from an estimated covariance matrix such as c = v e c h ( Σ z ) . Using z = ( Y 1 , , Y m , x 1 , , x p ) T can give information about a multivariate regression. Let θ ^ F and θ F denote the model that uses ( θ 1 , , θ p ) T , and let I denote the model that uses ( θ i 1 , , θ i k ) T . Then some of these estimators satisfy n ( θ ^ F θ F ) D N p ( 0 , Σ F ) and n ( θ ^ I θ I ) D N k ( 0 , Σ I ) , where the estimator Σ ^ F is easy to compute in high dimensions but is singular if p > n , while the estimator Σ ^ I is easy to compute in high dimensions and is nonsingular if n > J k with J 10 . Often, the theory for F uses z while the theory for I uses u = ( z i 1 , , z i k ) T . (Values of J much larger than 10 may be needed if some of the z i j are skewed).
Then the following simple testing method reduces a possibly high-dimensional problem to a low-dimensional problem. Consider testing H 0 : A θ = 0 versus H 1 : A θ 0 where A is a d × p constant matrix and the hypothesis test is equivalent to testing H 0 : B θ I = 0 versus H 1 : B θ I 0 where B is a d × k constant matrix. For example, tests such as H 0 : θ i = 0 or H 0 : θ i θ k = 0 are often of interest.
The marginal maximum likelihood estimator (MMLE) and one component partial least squares (OPLS) estimators stack low-dimensional estimators into a vector. In low dimensions, the OLS estimators are α ^ O L S = Y ¯ β ^ O L S T x ¯ and
β ^ O L S = Σ ^ x 1 Σ ^ x Y = Σ ^ x 1 η ^ .
For a multiple linear regression model with iid cases, β ^ O L S is a consistent estimator of β O L S = Σ x 1 Σ x Y under mild regularity conditions, while α ^ O L S is a consistent estimator of E ( Y ) β O L S T E ( x ) .  
Refs.  [27,28] showed that β ^ O P L S = λ ^ Σ ^ x Y estimates λ Σ x Y = β O P L S where
λ = Σ x Y T Σ x Y Σ x Y T Σ x Σ x Y
and
λ ^ = Σ ^ x Y T Σ ^ x Y Σ ^ x Y T Σ ^ x Σ ^ x Y
for Σ x Y 0 . If Σ x Y = 0 , then β O P L S = 0 . Let η ^ O P L S = Σ ^ x Y . Testing H 0 : A β O P L S = 0 versus H 1 : A β O P L S 0 is equivalent to testing H 0 : A η = 0 versus H 1 : A η 0 where A is a k × p constant matrix and η = Σ x Y .
The marginal maximum likelihood estimator (marginal least squares estimator) is due to [22,23]. This estimator computes the marginal regression of Y on x i resulting in the estimator ( α ^ i , M , β ^ i , M ) for i = 1 , , p . Then β ^ M M L E = ( β ^ 1 , M , , β ^ p , M ) T . For multiple linear regression, the marginal estimators are the simple linear regression (SLR) estimators, and ( α ^ i , M , β ^ i , M ) = ( α ^ i , S L R , β ^ i , S L R ) . Hence
β ^ M M L E = [ d i a g ( Σ ^ x ) ] 1 Σ ^ x , Y .
If the w i are the predictors standardized to have unit sample variances, then
β ^ M M L E = β ^ M M L E ( w , Y ) = Σ ^ w , Y = I 1 Σ ^ w , Y = η ^ O P L S ( w , Y )
where ( w , Y ) denotes that Y was regressed on w, and I is the p × p identity matrix. Hence the SC scree plot is closely related to the MMLE for multiple linear regression with standardized predictors.
Consider a subset of k distinct elements from Σ ^ . Stack the elements into a vector, and let each vector have the same ordering. For example, the largest subset of distinct elements corresponds to
v e c h ( Σ ^ ) = ( σ ^ 11 , , σ ^ 1 p , σ ^ 22 , , σ ^ 2 p , , σ ^ p 1 , p 1 , σ ^ p 1 , p , σ ^ p p ) T = [ σ ^ j k ] .
For random variables x 1 , , x p , use notation such as x ¯ j = the sample mean of x j , μ j = E ( x j ) , and σ j k = C o v ( x j , x k ) . Let
n v e c h ( Σ ^ ) = [ n σ ^ j k ] = i = 1 n [ ( x i j x ¯ j ) ( x i k x ¯ k ) ] .
For general vectors of elements, the ordering of the vectors will all be the same and be denoted vectors such as c ^ = [ σ ^ j k ] and c = [ σ j k ] .
Ref.  [29] proved that n ( c ^ c ) D N d ( 0 , Σ F ) if θ = c is a d × 1 vector. The theorem may be a special case of the [30] theory for the multivariate linear regression estimator when there are no predictors. Ref.  [31] also gave similar large sample theory, for example, for c = v e c h ( Σ x ) , but the proof in [29] and the estimator Σ ^ F are much simpler. Also see [32].
The following [12] large sample theory for Σ ^ x Y is also a special case. Let x i = ( x i 1 , , x i p ) T and let w i and z i be defined below where
Cov ( w i ) = Σ w = E [ ( x i μ x ) ( x i μ x ) T ( Y i μ Y ) 2 ) ] Σ x Y Σ x Y T .
Then the low-order moments are needed for Σ ^ z to be a consistent estimator of Σ w .
Theorem 2.
Assume the cases ( x i T , Y i ) T are iid. Assume E ( x i j k Y i m ) exist for j = 1 , , p and k , m = 0 , 1 , 2 . Let μ x = E ( x ) and μ Y = E ( Y ) . Let w i = ( x i μ x ) ( Y i μ Y ) with sample mean w ¯ n . Let η = Σ x Y . Then (a)
n ( w ¯ n η ) D N p ( 0 , Σ w ) and n ( η ^ n η ) D N p ( 0 , Σ w ) .
(b) Let z i = x i ( Y i Y ¯ n ) and v i = ( x i x ¯ n ) ( Y i Y ¯ n ) . Then Σ ^ w = Σ ^ z + O P ( n 1 / 2 ) = Σ ^ v + O P ( n 1 / 2 ) .
(c) Let A be a k × p full rank constant matrix with k p , assume H 0 : A β O P L S = 0 is true, and assume λ ^ P λ 0 . Then
n A ( β ^ O P L S β O P L S ) D N k ( 0 , λ 2 A Σ w A T ) .
This method of hypothesis testing does not depend on whether the population model is sparse or abundant, and it does need data splitting for valid inference. Data splitting with sparse fitted models can also be used for high-dimensional hypothesis testing. See, for example, [12]. Ref.  [29] also provides the theory for the OPLS estimator and MMLE for multiple linear regression where heterogeneity is possible and where the predictors may have been standardized to have unit sample variances.
The MMLE for multiple linear regression is often used for variable selection if the predictors have been standardized to have unit sample variances. Then the k predictors with the largest | β ^ i | correspond to the k predictors with the largest squared correlations with Y. Hence this method can be used to select k in Section 2.4.

2.5. Alternative Dispersion Estimators

Let Σ ^ be a p × p symmetric positive semi-definite matrix such as R , R 1 , Σ ^ x , Σ ^ x 1 , X T X , or ( X T X ) 1 . When Σ ^ is singular or ill conditioned, some common techniques are to replace Σ ^ with a symmetric positive definite matrix D ^ such as D ^ = d i a g ( Σ ^ ) , D ^ = ( Σ ^ + λ I p ) , where the constant λ > 0 , or D ^ = D = I p . Regularized estimators are also used.
For δ 0 , a simple way to regularize a p × p correlation matrix R = ( r i j ) is to use
R δ = 1 1 + δ ( R + δ I p ) = ( t i j )
where t i i = 1 and
t i j = r i j 1 + δ
for i j . Note that each correlation r i j is divided by the same factor 1 + δ . If λ i is the ith eigenvalue of R, then ( λ i + δ ) / ( 1 + δ ) is the ith eigenvalue of R δ . The eigenvectors of R and R δ are the same since if R x = λ i x , then
R δ x = 1 1 + δ ( R + δ I p ) x = 1 1 + δ ( λ i + δ ) x .
Note that R δ = κ R + ( 1 κ ) I p , where κ = 1 / ( 1 + δ ) ( 0 , 1 ] . See [33,34].
Following [35], the condition number of a symmetric positive definite p × p matrix A is c o n d ( A ) = λ 1 ( A ) / λ p ( A ) , where λ 1 ( A ) λ 2 ( A ) λ p ( A ) > 0 are the eigenvalues of A. Note that c o n d ( A ) 1 . A well conditioned matrix has condition number c o n d ( A ) c for some number c such as 50 or 500. Hence R δ is nonsingular for δ > 0 and well conditioned if
c o n d ( R δ ) = λ 1 + δ λ p + δ c ,
or
δ = max 0 , λ 1 c λ p c 1
if 1 < c 500 . Taking c = 50 suggests using
δ = max 0 , λ 1 50 λ p 49 .
The matrix can be further regularized by setting t i j = 0 if | t i j | τ , where τ [ 0 , 1 ) should be less than 0.5. Denote the resulting matrix by R ( δ , τ ) . We suggest using τ = 0.05 . Note that R δ = R ( δ , 0 ) . Using τ is known as thresholding. We recommend computing I p , R ( δ , 0 ) and R ( δ , 0.05 ) for c = 50, 100, 200, 300, 400, and 500. Compute R if it is nonsingular. Note that a regularized covariance matrix can be found using
S ( δ , τ ) = D S R ( δ , τ ) D S
where S = Σ ^ x and D S = diag ( S 11 , , S pp ) .
A common type of regularization of a covariance matrix S is to use S D = d i a g ( S ) where the i j th element of S D = 0 and S D ( i , i ) = S ( i , i ) . The corresponding correlation matrix is the identity matrix, and Mahalanobis distances using the identity matrix correspond to Euclidean distances. These estimators tend to use too much regularization, and underfit. Note that as δ , R δ I p , and I p has c = 1 . Note that S D corresponds to using R ( δ = , 0 ) = I p in Equation (9).
For the population correlation matrix ρ x and the population precision matrix ρ x 1 , the literature often claims that most of the population correlations ρ i j = 0 , so that the population matrix is sparse, and that D ^ is a good estimator of the population matrix. Assume that D ^ estimates a population dispersion matrix D. Note that this assumption always holds when D ^ = I p = D . Note that d i a g ( S ) estimates d i a g ( Σ x ) since ( σ ^ 1 2 , , σ ^ p 2 ) T estimates ( σ 1 2 , , σ p 2 ) T where σ i 2 = V ( x i ) for i = 1 , , p . However, by Equation (3), the estimator tends to not be good in high dimensions.
Consider testing H 0 : θ = θ 0 versus H 1 : θ θ 0 , where a g × 1 statistic T n satisfies n ( T n θ ) D u N g ( 0 , Σ ) . If Σ ^ 1 P Σ 1 and H 0 is true, then
D n 2 = D θ 0 2 ( T n , Σ ^ / n ) = n ( T n θ 0 ) T Σ ^ 1 ( T n θ 0 ) D u T Σ 1 u χ g 2
as n . Then a Wald-type test rejects H 0 if D n 2 > χ g , 1 δ 2 where P ( X χ g , 1 δ 2 ) = 1 δ if X χ g 2 , a chi-quare distribution with g degrees of freedom. Note that D θ 0 2 ( T n , Σ ^ / n ) is a squared Mahalanobis distance.
It is common to implement a Wald-type test using
D n 2 = D θ 0 2 ( T n , C n / n ) = n ( T n θ 0 ) T C n 1 ( T n θ 0 ) D u T C 1 u
as n if H 0 is true, where the g × g symmetric positive definite matrix D ^ = C n P C Σ . Hence C n is the wrong dispersion matrix, and u T C 1 u does not have a χ g 2 distribution when H 0 is true. Ref.  [36] showed how to bootstrap Wald tests with the wrong dispersion matrix. When C n = I g , the bootstrap tests often became conservative as g increased to n. For some of these tests, the m out of n bootstrap, which draws a sample of size m without replacement from the n, works better than the nonparametric bootstrap. Sampling without replacement is also known as subsampling and the delete d jackknife. For some methods, better high-dimensional tests are reviewed by [37].
Using a high-dimensional dispersion estimator with considerable outlier resistance is another useful technique. Let W be a data matrix, where the rows w i T correspond to the cases. For example, w i = x i or w i = z i = ( Y i 1 , , Y i m , x i 1 , , x i p ) T . One of the simplest outlier detection methods uses the Euclidean distances of the w i from the coordinatewise median D i = D i ( MED ( W ) , I p ) . Concentration type steps compute the weighted median MED j : the coordinatewise median computed from the “half set” of cases x i with D i 2 MED ( D i 2 ( MED j 1 , I p ) ) where MED 0 = MED ( W ) . We often use j = 0 (no concentration type steps) or j = 9 . Let D i = D i ( MED j , I p ) . Let W i = 1 if D i MED ( D 1 , , D n ) + k MAD ( D 1 , , D n ) where k 0 and k = 5 is the default choice. Let W i = 0 , otherwise. Using k 0 ensures that at least half of the cases get weight 1. This weighting corresponds to the weighting that would be used in a one sided metrically trimmed mean (Huber-type skipped mean) of the distances. Here, the sample median absolute deviation is MAD ( n ) = MAD ( D 1 , , D n ) = MED ( | D i MED ( n ) | , i = 1 , , n ) , where MED ( n ) = MED ( D 1 , , D n ) is the sample median of D 1 , , D n .
Let the covmb2 set B of at least n / 2 cases correspond to the cases with weight W i = 1 . Then the [38] (p. 120) covmb2 estimator ( T , C ) is the sample mean and sample covariance matrix applied to the cases in set B. If w i = x i , then
T = i = 1 n W i x i i = 1 n W i and C = i = 1 n W i ( x i T ) ( x i T ) T i = 1 n W i 1 .
This estimator was built for speed, applications, and outlier resistance. In low dimensions, the population dispersion matrix is the population covariance matrix of a spherically truncated distribution. In high dimensions, spherical truncation is still used, but the sample weighted median varies about the population weighted median by Equation (3).
A useful application is to apply high- (and low)-dimensional methods to the cases that get weight 1. If the ith case w i = ( y i T , x i T ) T where y = ( Y 1 , , Y m ) T , then this application can be used if all of the variables are continuous. For a variant, let the continuous predictors from x i be denoted by u i for i = 1 , , n . Apply the covmb2 estimator to the u i , and then run the method on the m cases w i corresponding to the covmb2 set B indices i 1 , , i m , where m n / 2 . If the estimator has large sample theory “conditional” on the predictors x, then typically the same theory applies for the “robust estimator” since the response variables were not used to select the cases in B. These two applications can be used for regression, classification, neural networks, et cetera.
Another method to get an outlier-resistant estimator Σ ^ x Y is to use the following identity. If X and Y are random variables, then
Cov ( X , Y ) = [ Var ( X + Y ) Var ( X Y ) ] / 4 .
Then replace Var( W ) by [ σ ^ ( W ) ] 2 , where σ ^ ( W ) is a robust estimator of scale or standard deviation and W = X + Y or W = X Y . We used σ ^ ( W ) = 1.483 M A D ( W ) where M A D ( W ) = M A D ( n ) = M A D ( W 1 , , W n ) . Hence
Cov ^ ( X , Y ) = ( [ 1.483 M A D ( X + Y ) ] 2 [ 1.483 M A D ( X Y ) ] 2 ) / 4 .
In low dimensions, the [38] Olive (2017) RMVN or RFCH estimator of dispersion can be used.

3. Results

Example 1.
The [39] data were collected from n = 26 districts in Prussia in 1843. Let Y = the number of women married to civilians in the district, with a constant and predictors x 1 = the population of the district in 1843, x 2 = the number of married civilian men in the district, x 3 = the number of married men in the military in the district, and x 4 = the number of women married to husbands in the military in the district. Sometimes the person conducting the survey would not count a spouse if the spouse was not at home. Hence Y and x 2 are highly correlated but not equal. Similarly, x 3 and x 4 are highly correlated but not equal. We expect β = β O L S ( 0 , 1 , 0 , 0 ) T . Then β ^ O L S = ( 0.00035 , 0.9995 , 0.2328 , 0.1531 ) T , while forward selection with OLS and the C p criterion used β ^ I , 0 = ( 0 , 1.0010 , 0 , 0 ) T , lasso had β ^ L = ( 0.0015 , 0.9605 , 0 , 0 ) T , lasso variable selection β ^ L V S = ( 0.00007 , 1.006 , 0 , 0 ) T , β ^ M M L E = ( 0.1782 , 1.0010 , 48.5630 , 51.5513 ) T , and β ^ O P L S = ( 0.1727 , 0.0311 , 0.00018 , 0.00018 ) T . With scaled predictors t, β ^ O L S ( t , Y ) = ( 81.0283 , 40,877.4086 , 104.8576 , 66.2739 ) T while Σ ^ t Y = β ^ M M L E ( t , Y ) = Σ ^ t Y = η ^ O P L S ( t , Y ) = ( 40,678.97 , 40,937.98 , 21,877.44 , 22,308.46 ) T . The fitted values from the MMLE estimator tend to not estimate Y. The fitted values from the different estimators did have high correlation, but population values, such as β O P L S and β O L S , likely differ because x 1 and x 2 have very high correlation with Y.
Example 2.
The [40] pottery data have n = 36 pottery shards of Roman earthware produced between second century B.C. and fourth century A.D. Often the pottery was stamped by the manufacturer. A chemical analysis was done for p = 20 chemicals (variables), the types of pottery were 1, Arretine; 2, not Arretine; 3, North Italian; 4, Central Italian; and 5, questionable origin. These codes were used for the response variable Y. Figure 1 shows the SC scree plot for this data set. Seven of the predictors ( x 7 , x 16 , x 9 , x 20 , x 12 , x 2 , x 11 ), had little correlation with Y. Changing the codes of the last 4 groups to 0 (binary response) gave a similar plot (not shown). Some output is shown below.
zy <-pottery[,1]
zx <- pottery[,-1]
SCscree(zx,zy)
$csums
  [1] 0.1599179 0.2844740 0.4036888 0.4978070 0.5873033 0.6639728 0.7378077
  [8] 0.8028591 0.8506671 0.8930894 0.9247144 0.9519310 0.9726192 0.9808558
[15] 0.9876990 0.9939687 0.9975125 0.9990487 0.9997179 1.0000000
$indx
  [1]  5  4 17  1 10  8  6 13 18 14 15 19  3  7 16  9 20 12  2 11
The function ddplot5 plots the Euclidean distances from the coordinatewise median versus the Euclidean distances from the covmb2 location estimator. Typically, the plotted points in this DD plot cluster about the identity line, and outliers appear in the upper-right corner of the plot with a gap between the bulk of the data and the outliers.
The function rcovxy yields the classical and three robust estimators of η = Σ x Y , and it makes a scatterplot matrix of the four estimated sufficient predictors η ^ T x and Y. Only two robust estimators are made if n 2.5 p .
Example 3.
For the [41] data with multiple linear regression, height was the response variable while an intercept, head length, nasal height, bigonal breadth, and cephalic index were used as predictors in the multiple linear regression model. Observation 9 was deleted since it had missing values. Five individuals, cases 61–65, were reported to be about 0.75 inches tall with head lengths well over five feet!
Figure 2a shows the response plot for lasso. The identity line passes right through the outliers which are obvious because of the large gap. Figure 2b shows the response plot from lasso for the cases in the covmb2 set B applied to the predictors, and the set B included all of the clean cases and omitted the 5 outliers. The response plot was made for all of the data, including the outliers. Prediction interval bands are also included for both plots. Both plots are useful for outlier detection, but the method for plot Figure 2b is better for data analysis as follows: impossible outliers should be deleted or given 0 weight, we do not want to predict that some people are about 0.75 inches tall, and we do want to predict that the people were about 1.6 to 1.8 m tall. Figure 3 shows the DD plot made using ddplot5. The five outliers are in the upper-right corner. Figure 4 shows the plot made using rcovxy. The five outliers are easy to detect, and the covmb2 estimator of Σ ^ x Y T x , denoted by rmb2ESP, linearized the data in the upper row.

4. Discussion

Detecting outliers, eliminating weak predictors, and recognizing that β ^ I estimates β I are important techniques for high-dimensional statistics. Another useful technique is to fit several high-dimensional estimators and choose the one that minimizes some criterion, such as 5-fold cross validation. For multiple linear regression, useful estimators include lasso, model selection PLS, ridge regression, et cetera, using all of the predictors and deleting weak predictors. Using several estimators for high-dimensional data increases the chance of getting a good fit to the data.
Lasso often works well on high-dimensional data sets if there are strong predictors. Lasso also deletes weak predictors, and β ^ I often estimates β I . PLS often works well on high-dimensional chemometrics data sets, but it is not real clear why PLS works well when γ ^ j is not a good estimator of γ j . The PLS literature often assumes that Y | x = Y | β k P L S T x where β k P L S is the population parameter for the k-component PLS estimator. Ref.  [12] showed that this PLS assumption tends to be quite strong.
The R software was used. See [42]. Programs are available from the collection of R functions slpack.txt, available from (http://parker.ad.siu.edu/Olive/slpack.txt, accessed on 30 June 2025). Proofs for Equation (4) and Theorems 1 and 2 were not given, but they are available from preprints of the corresponding published papers from (http://parker.ad.siu.edu/Olive/preprints.htm, accessed on 30 June 2025).
The covmb2 estimator attempts to give a robust dispersion estimator that reduces the bias by using a big ball about MED j instead of a ball that contains half of the cases. The median ball is the hypersphere centered at the coordinatewise median with radius r = MED ( D i ( MED ( W ) , I p ) , i = 1 , , n ) that tends to contain ( n + 1 ) / 2 of the cases if n is odd. The slpack function getB gives the set B of cases that got weight 1 along with the index indx of the case numbers that got weight 1.
The function corrlar produces the regularized correlation matrices Rd = R ( δ , 0 ) and Rt = R ( δ , τ ) given a correlation matrix, condition number c, and threshold tau with τ = 0.05 the default. The function SCscree makes the SC scree plot.

Funding

This research received no external funding.

Data Availability Statement

The three data sets are available from (http://parker.ad.siu.edu/Olive/sldata.txt, accessed on 30 June 2025).

Acknowledgments

The authors thank the editors and referees for their work.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CCAcanonical correlation analysis
iidindependent and identically distributed
MDPIMultidisciplinary Digital Publishing Institute
MMLEmarginal maximum likelihood estimator
OLSordinary least squares
OPLSone component partial least squares
PCAprincipal component analysis
PCRprincipal component regression
PLSpartial least squares
SPsufficient predictor

References

  1. Cook, R.D.; Forzani, L. Partial Least Squares Regression: And Related Dimension Reduction Methods; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024. [Google Scholar]
  2. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
  3. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
  4. Chun, H.; Keleş, S. Sparse partial least squares regression for simultaneous dimension reduction and predictor selection. J. R. Stat. Soc. Ser. B 2010, 72, 3–25. [Google Scholar] [CrossRef] [PubMed]
  5. Olive, D.J.; Hawkins, D.M. Variable selection for 1D regression models. Technometrics 2005, 47, 43–50. [Google Scholar] [CrossRef]
  6. Mallows, C. Some comments on Cp. Technometrics 1973, 15, 661–676. [Google Scholar]
  7. Rathnayake, R.C.; Olive, D.J. Bootstrapping some GLMs and survival regression models after variable selection. Commun. Stat. Theory Methods 2023, 52, 2625–2645. [Google Scholar] [CrossRef]
  8. Gelman, A.; Carlin, J. Some natural solutions to the p-value communication problem-and why they won’t work. J. Am. Stat. Assoc. 2017, 112, 899–901. [Google Scholar] [CrossRef]
  9. Nester, M.R. An applied statistician’s creed. J. R. Stat. Soc. Ser. C 1996, 45, 401–410. [Google Scholar] [CrossRef]
  10. Tukey, J.W. The philosophy of multiple comparisons. Stat. Sci. 1991, 6, 100–116. [Google Scholar] [CrossRef]
  11. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
  12. Olive, D.J.; Zhang, L. One component partial least squares, high dimensional regression, data splitting, and the multitude of models. Commun. Stat. Theory Methods 2025, 54, 130–145. [Google Scholar] [CrossRef]
  13. Artigue, H.; Smith, G. The principal problem with principal components regression. Cogent Math. Stat. 2019, 6, 1622190. [Google Scholar] [CrossRef]
  14. Cook, R.D. Fisher lecture: Dimension reduction in regression. Stat. Sci. 2007, 22, 1–26. [Google Scholar] [CrossRef]
  15. Cook, R.D. Principal components, sufficient dimension reduction, and envelopes. Ann. Rev. Stat. Appl. 2018, 5, 533–559. [Google Scholar] [CrossRef]
  16. Zhang, J.; Chen, X. Principal envelope model. J. Stat. Plan. Infer. 2020, 206, 249–262. [Google Scholar] [CrossRef]
  17. Brown, P.J. Measurement, Regression, and Calibration; Oxford University Press: New York, NY, USA, 1993. [Google Scholar]
  18. Cook, R.D.; Forzani, L. PLS regression algorithms in the presence of nonlinearity. Chemom. Intell. Lab. Syst. 2021, 213, 104307. [Google Scholar] [CrossRef]
  19. Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: London, UK, 1979. [Google Scholar]
  20. Cook, R.D.; Weisberg, S. Applied Regression Including Computing and Graphics; Wiley: New York, NY, USA, 1999. [Google Scholar]
  21. Jung, S.; Marron, J.S. PCA consistency in high dimension low sample size context. Ann. Stat. 2012, 37, 4104–4130. [Google Scholar] [CrossRef]
  22. Fan, J.; Lv, J. Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B 2008, 70, 849–911. [Google Scholar] [CrossRef]
  23. Fan, J.; Song, R. Sure independence screening in generalized linear models with np-dimensionality. Ann. Stat. 2010, 38, 3217–3841. [Google Scholar] [CrossRef]
  24. Mehmood, T.; Sæbø, S.; Liland, K.H. Comparison of variable selection methods in partial least squares regression. J. Chemom. 2020, 34, e3226. [Google Scholar] [CrossRef]
  25. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  26. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
  27. Basa, J.; Cook, R.D.; Forzani, L.; Marcos, M. Asymptotic distribution of one-component partial least squares regression estimators in high dimensions. Can. J. Stat. 2024, 52, 118–130. [Google Scholar] [CrossRef]
  28. Cook, R.D.; Helland, I.S.; Su, Z. Envelopes and partial least squares regression. J. R. Stat. Soc. Ser. B 2013, 75, 851–877. [Google Scholar] [CrossRef]
  29. Olive, D.J.; Alshammari, A.A.; Pathiranage, K.G.; Hettige, L.A.W. Testing with the one component partial least squares and the marginal maximum likelihood estimators. Commun. Stat. Theory Methods 2025. [Google Scholar] [CrossRef]
  30. Su, Z.; Cook, R.D. Inner envelopes: Efficient estimation in multivariate linear regression. Biometrika 2012, 99, 687–702. [Google Scholar] [CrossRef]
  31. Yuan, K.H.; Chan, W. Biases and standard errors of standardized regression coefficients. Psychometrika 2011, 76, 670–690. [Google Scholar] [CrossRef] [PubMed]
  32. Neudecker, H.; Wesselman, A.M. The asymptotic variance matrix of the sample correlation matrix. Lin. Alg. Appl. 1990, 127, 589–599. [Google Scholar] [CrossRef]
  33. Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Mult. Analys. 2004, 88, 365–411. [Google Scholar] [CrossRef]
  34. Warton, D.I. Penalized normal likelihood and ridge regularization of correlation and covariance matrices. J. Am. Stat. Assoc. 2008, 103, 340–349. [Google Scholar] [CrossRef]
  35. Datta, B.N. Numerical Linear Algebra and Applications; Brooks/Cole: Pacific Grove, CA, USA, 1995. [Google Scholar]
  36. Rajapaksha, K.W.G.D.H.; Olive, D.J. Wald type tests with the wrong dispersion matrix. Commun. Stat. Theory Methods 2024, 53, 2236–2251. [Google Scholar] [CrossRef]
  37. Hu, J.; Bai, Z. A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices. Sci. China Math. 2016, 59, 2281–2300. [Google Scholar] [CrossRef]
  38. Olive, D.J. Robust Multivariate Analysis; Springer: New York, NY, USA, 2017. [Google Scholar]
  39. Hebbler, B. Statistics of Prussia. J. Stat. Soc. Lond. 1847, 10, 154–186. [Google Scholar] [CrossRef]
  40. Wisseman, S.U.; Hopke, P.K.; Schindler-Kaudelka, E. Multielemental and multivariate analysis of Italian terra sigillata in the world heritage museum, university of Illinois at Urbana-Champaign. Archeomaterials 1987, 1, 101–107. [Google Scholar]
  41. Buxton, L.H.D. The anthropology of Cyprus. J. R. Anthrop. Inst. Great Br. Irel. 1920, 50, 183–235. [Google Scholar] [CrossRef]
  42. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024. [Google Scholar]
Figure 1. SC scree plot for pottery data.
Figure 1. SC scree plot for pottery data.
Stats 08 00060 g001
Figure 2. Response plot for lasso and lasso applied to the covmb2 set B.
Figure 2. Response plot for lasso and lasso applied to the covmb2 set B.
Stats 08 00060 g002
Figure 3. DD plot.
Figure 3. DD plot.
Stats 08 00060 g003
Figure 4. Diagnostics for OPLS and Σ ^ x Y .
Figure 4. Diagnostics for OPLS and Σ ^ x Y .
Stats 08 00060 g004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Olive, D.J. Some Useful Techniques for High-Dimensional Statistics. Stats 2025, 8, 60. https://doi.org/10.3390/stats8030060

AMA Style

Olive DJ. Some Useful Techniques for High-Dimensional Statistics. Stats. 2025; 8(3):60. https://doi.org/10.3390/stats8030060

Chicago/Turabian Style

Olive, David J. 2025. "Some Useful Techniques for High-Dimensional Statistics" Stats 8, no. 3: 60. https://doi.org/10.3390/stats8030060

APA Style

Olive, D. J. (2025). Some Useful Techniques for High-Dimensional Statistics. Stats, 8(3), 60. https://doi.org/10.3390/stats8030060

Article Metrics

Back to TopTop