An Approach to Canonical Correlation Analysis Based on Rényi’s Pseudodistances

Canonical Correlation Analysis (CCA) infers a pairwise linear relationship between two groups of random variables, X and Y. In this paper, we present a new procedure based on Rényi’s pseudodistances (RP) aiming to detect linear and non-linear relationships between the two groups. RP canonical analysis (RPCCA) finds canonical coefficient vectors, a and b, by maximizing an RP-based measure. This new family includes the Information Canonical Correlation Analysis (ICCA) as a particular case and extends the method for distances inherently robust against outliers. We provide estimating techniques for RPCCA and show the consistency of the proposed estimated canonical vectors. Further, a permutation test for determining the number of significant pairs of canonical variables is described. The robustness properties of the RPCCA are examined theoretically and empirically through a simulation study, concluding that the RPCCA presents a competitive alternative to ICCA with an added advantage in terms of robustness against outliers and data contamination.


Introduction
Canonical Correlation Analysis (CCA) is a statistical technique used to identify and measure associations among two sets of variables; in the following, denoted by X q×1 and Y p×1 (q ≤ p). It is appropriate in situations where multiple regression would be used but where there are multiple intercorrelated outcome variables. Hence, it allows us to summarize relationships into a lesser number of statistics while preserving the main facets of those relationships. CCA was first considered in [1] and has been widely used in the statistical literature; for example, to summarize relationships between sets of variables, to reduce the dimensionality of data or to transform two sets of variables into a new dataset of uncorrelated variables as a preprocessing step for the multiple linear regression model. More insight about CCA can be found, e.g., in [2,3].
CCA looks for two direction vectors a, b (canonical vectors) such that the linear combinations U = a T X and V = b T Y, so called canonical variables, are (linearly) correlated as much as possible. However, if a linear relationship does not exist between the pairs a T X and b T Y, CCA could fail in detecting these pairs of canonical vectors. In other words, CCA can only detect linear relations between the canonical variables, but other functional relationships may exist.
The linear restriction is a significant drawback of CCA when analyzing some real data with highly non-linear relationships. For example, Oulai et al. [4] presented a real situation with non-linear relationships between variables regarding the representation of a hydrological process in the delineation of homogeneous regions. In their context, the two groups of variables under consideration were hydrological variables and meteorological and/or graphical characteristics of watersheds, and their non-linear relationship depended essentially on the physiographic characteristics of the watersheds. Additionally, Ref. [5] presented a nice application of non-linear CCA to seasonal climate forecasting. In [6], some real life data with complex non-linear relationships that cannot be properly captivated by classical CCA are also presented. There is an extensive bibliography addressing non-linear CCA. Without wishing to cite all the existing literature on the topic, we would like to mention some interesting works on the subject: [7] (Chapter 6), [8][9][10][11] and references therein.
To shed light on this problem, let us consider the following situation described in [12]: let X = (X 1 , X 2 ) T and Y = (Y 1 , Y 2 ) T be a pair of random vectors such that with Z ∼ χ 2 1 and independent from X. In this case, Cov(X, Y) = 0 2×2 , and so the vectors X and Y are uncorrelated (so they are linearly independent). Consequently, classical CCA cannot detect that, indeed, Y 1 is related (although not linearly) to X 1 , even if the variables are not fully independent. On the other hand, as the pair (X, Y) does not follow a normal distribution (and therefore uncorrelation does not imply independence), a hidden relationship may exist (and indeed it does exist!) that has not been detected by CCA. Of course, under normality, rejecting any linear correlation using CCA implies independence between both variables.
It is not surprising that CCA fails in the previous example, as CCA focuses on "linear trends", but the true relation underlying it is quadratic. To overcome this drawback, in a pioneer paper, Yin [12] proposed the use of the Kullback-Leibler divergence and developed a new procedure called Informational Canonical Correlation Analysis (ICCA), aiming to also detect non-linear relationships for linear combinations of the components.
Let U = a T X and V = b T Y be linear combinations of X and Y defining a pairwise of canonical variables, with a ∈ R q and b ∈ R p . We denote by f UV (u, v) the joint probability density function (PDF) of (U, V), and, by f U (u) (resp. f V (v)), the marginal unidimensional PDF of U (resp. V). From a statistical point of view, both canonical variables U and V would be independent if their joint distribution coincides with the product of the marginal PDF's, f UV (u, v) = f U (u) × f V (v), and, conversely, a strong dependence between U and V would result in a large statistical distance between the joint PDF and the product of the marginals. A suitable divergence should then be adopted to measure such statistical closeness of the two PDFs. The Kullback-Leibler divergence is the most commonly used measure for distinguishing two distributions, and it has a great statistical importance in the field of information theory.
The Kullback-Leibler divergence between f UV (u, v) and f U (u) × f V (v) is given as The above divergence is not symmetric, so it quantifies the expected inaccuracy excess from using f U × f V as a model when the actual PDF is f UV . That is, the inaccuracy caused by assuming independence between the pair of canonical variables. Consequently, truly independent canonical variables should minimize the Kullback-Leibler divergence in Equation (1); conversely, functionally dependent canonical variables should maximize the divergence. For more details about the Kullback-Leibler divergence, see [13]. In this vein, ICCA aims to identify q pairwise canonical variables a i ∈ R q and b a, b). However, the Kullback-Leibler divergence is invariant under linear transformations, and so there are infinitely many ways to define canonical vectors yielding the same objective function. Then, for identification, we constrain the canonical variables to have unit variance. Moreover, once a relationship is identified by a pair of canonical variables, we expect to exclude its effect from the consecutive canonical variables. For such a purpose, we also require that pairs of canonical variables are uncorrelatated with any other pair. That is, ICCA finds q linearly independent pairs of canonical variables with unit variance maximizing (in decreasing order) the corresponding Kullback-Leibler divergence. Mathematically, to compute each pair of variables, we need to solve the optimization problem where Σ X and Σ Y denote the variance-covariance matrices of X and Y, respectively. We apply the same for RP. From Yin's (2004) reinterpretation of the canonical analysis, several procedures based on divergence and entropy measures have been proposed to reduce the limitations of CCA. For example, Mandal et al. [14] considered (α, β) divergence measures defined in [15], and Iaci and Sriram [6] used the density power divergence measures defined in [16] as a measure of statistical closeness. In [17], canonical dependence based on the squared-loss mutual information was studied. Other interesting results regarding ICCA can be seen in [18][19][20][21][22][23].
Despite its popularity, the Kullback-Leibler divergence association measure is quite sensitive to outlying observations, as pointed out in [24]. For outliers, we mean data that behave very differently to expectations according to the law modeling the relation. The main purpose of this paper is to extend the ICCA procedure to a wider family of robust methods based on RP divergence, which remains competitive to ICCA in terms of efficiency but provides a more stable estimation of the canonical vectors in the presence of contamination in the data.
The RP family, parameterized by a tuning parameter τ controlling the trade-off between robustness and efficiency, was considered for the first time in Jones et al. [25]. Later, Broniatowski et al. [26] demonstrated that RP is a proper divergence, positive for any two densities and for all values of the tuning parameter [26,27], and it is null if (and only if) both densities are the same. The theory in [26] for independent and identically distributed random variables was extended to the case of independent but not identically distributed random variables in [28]. They termed this family of pseudodistances as RP because of their similarities with Renyi's divergence measures Rényi (1961) [29]. Rényi's pseudodistance has shown promising behavior in other statistical problems, providing robust minimum RP estimators with good asymptotic and robustness properties, and it includes the Kullback-Leibler divergence as a particular case at τ = 0. For example, Toma and Leoni-Aubin [30] considered efficient and robust measures for general parametric models based on RP and, Toma et al. [31] later developed a new criterion for model selection based on the RP. In [27], Castilla et al. introduced a family of Wald-type tests for testing the parameters in linear regression models, and these results were later extended for generalized linear regression models in [32,33]. Wald-type tests based on minimum RP estimators in bidimensional normal populations were considered in [34]. Jaenada et al. [35] introduced and studied the minimum RP estimators under restricted parameter spaces, which are of great statistical interest in many practical applications such as hypothesis testing. Under the name of γ-entropy, Fujisawa and Eguchi [36] applied RP to introduce robust estimators of general parametric families. Motivated for the great performance of the minimum RP estimator on those different statistical models in terms of robustness, we have adopted the RP divergence to extend the ICCA procedure.
The rest of the paper is organized as follows. The Rényi's Pseudodistance Canonical Correlation Analysis (RPCCA) is introduced in Section 2, and some of its properties are studied. Next, an estimation design for computing the canonical vectors in practice using RPCCA is described in Section 3. In Section 4, the robustness of the RPCCA is theoretically established. Section 5 describes a permutation test to determine the number of significant canonical variables and thereby provide a dimension reduction method. In Section 6, a Monte Carlo simulation study is carried out to empirically evaluate the performance of the RPCCA and compare the proposed method with the ICCA in terms of estimation accuracy and robustness. An example with real data is studied in Section 6.3. Finally, some conclusions are drawn in Section 7.

Rényi's Pseudodistance Canonical Correlation Analysis
Given two multidimensional random variables X and Y, the RPCCA aims to identify two direction vectors a and b (the canonical vectors), such that the corresponding canonical variables U = a T X and V = b T Y are as dependent as possible. Such dependency is measured in terms of RP between their joint distribution and the product of their marginal distributions. The RP of tuning parameter τ between the joint distribution of the bidimensional random variable (U, V) and the product of their marginals, f U (u) × f V (v), is given for τ > 0 by (cf. [26]).
Hence, the RP measures the statistical discrepancy between the joint PDF of the canonical variables, f UV and the marginal PDF's product f U × 4 V , or, in other words, the loss in accuracy that comes with assuming independence.
For τ = 0, the RP can be defined as the corresponding limit, τ → 0, yielding the Kullback-Leibler divergence: ( As earlier discussed, independent canonical variables lead to d τ (a, b) = 0, and, contrarily, strong dependency should result in large RP distances. Then, the RPCCA procedure aims to identify pairwise canonical vectors a i ∈ R q and b i ∈ R p , i ≤ q ≤ p such that and, as before for identification, the canonical variables should have unit variance and be uncorrelated with any previous pairwise of canonical variables: where Σ X and Σ Y are the variance-covariance matrices of X and Y, respectively.
Note that, by Equation (2), the ICCA procedure presented in [12] is recovered at τ = 0, and so the RPCCA generalizes ICCA.

Remark 1.
Given the random vectors X and Y, RPCCA finds the vectors a 1 , b 1 such that a T 1 X and b T 1 Y are maximally related. This maximal relation is measured via d τ (a i , b i ), as previously defined. Once these vectors a 1 , b 1 are obtained, the procedure looks for a new pair of vectors a 2 , b 2 such that a 1 and a 2 are incorrelated, and the same applies for b 1 , b 2 , and a T 2 X and b T 2 Y are maximally related. Consequently, Next, the procedure looks for a 3 , b 3 being incorrelated to a 1 , a 2 and b 1 , b 2 , respectively, and so on.
Then, independence arises and the procedure stops. In practice, we will have an estimation of d τ (a i , b i ), and we will stop the procedure if this value does not exceed a certain threshold. This will be applied in Section 5 in order to determine the number of components.
Let us consider, again, the example described in the introduction, where X = (X 1 , The true value of the first pair of canonical vectors are then a 1 = b 1 = (1, 0) T . Under the described setup, it follows that and because of the properties of the divergence The last inequality holds because the RP divergence, d τ (·, ·), only reaches the value zero if both arguments coincide, as discussed in Section 1 (see [26] for more details). In this case, RPCCA should identify a pair a 1 , b 1 with a non-zero informational coefficient of canonical correlation defining the canonical variables a T 1 X and b T 1 Y . For practical use of RPCCA, it is interesting to note that RPCCA is equivariant under invertible linear transformations. This equality does not hold for other extensions of ICCA, but proportionality arises instead. Proposition 1. Consider two random variables U and V, and take R = cU and S = eV, where c and e are non-zero real numbers (Indeed, the result also holds if we consider two random vectors U, V , and consider R = CU and S = DV , where C and D are two invertible matrices. In this case, RP is computed considering multidimensional integrals.). Then, The next result establishes that the RPCCA is reduced to CCA in the case of normal distributions.

Proposition 2.
In the case of normal distributions, RPCCA coincides with CCA.
Proof. Consider normal populations, i.e., assume that the multidimensional random variables X and Y are jointly normally distributed, Therefore, the bidimensional random variable (U, V) = a T X, b T Y follows a bidimensional normal distribution whose vector mean is and the variance-covariance matrix is given by On the other hand, the marginal densities f µ 1 ,σ 1 (u) and f µ 2 ,σ 2 (v) of a T X and b T Y, respectively, are normal distributions, We first compute the RP between f µ 1 ,σ 1 (u)× f µ 2 ,σ 2 (v) and f µ 1 ,µ 2 ,σ 1, σ 2 ,ρ (u, v). Considering the results obtained in Supplementary Materials (Appendix A) in [6], we have On the other hand, it is not difficult to see that Based on the previous quantities, we have For fixed τ, it can be seen from the previous expression that d τ (a, b) depends on ρ. Moreover, it is not difficult to show that d τ (a, b) is an increasing function on ρ 2 for any τ (see Figure 1 for τ = 0.1, 0.3 and 0.9). To show this, it suffices to see that is increasing in ρ 2 , and so it will be its logarithm transform. Now, note that So, it suffices to show that the function is increasing in ρ 2 . Taking derivatives with respect to ρ 2 , we obtain Thus, it suffices to check the non-negativity of and the result holds. Thus, RPCCA is equivalent to classical CCA in the case of random normal variables.
It can be seen that d τ (a; b) is an increasing function on ρ 2 for any τ > 0 under normal distributions; hence, RPCCA also extends CCA with a tuning parameter τ determining the sharpness of the distance d τ (a, b) (or the function f τ (·) in the proof of Proposition 2).

Consistency
We now focus on the practical side of the RPCCA estimation. In practice, the PDFs f UV , f U and f V are unknown; thus, they should be empirically estimated. Likewise, the RPCCA should be formulated for an empirical setup.
The RP, d τ (a, b), can be expressed in terms of expected values as This interpretation of d τ (a, b) makes the definition of its empirical estimator easier. Let . ., n be a random sample of size n from the multidimensional random variables (X, Y). Then, an empirical estimator of d τ (a, b) is given by Here and For the PDF's estimators, we will use the univariate Gaussian kernel with a j n = 1.06n −0.2 s j and b j n = n −1/6 s j for j = 1, 2, and the corresponding sample standard deviations s 1 and s 2 . This kernel function was proposed in [37] and adopted in many other extensions of ICCA, but other types of kernels could be considered instead, as long as they satisfy the conditions of Lemma 1 below (When the distribution is known up to a parameter value f θ , this information should be taken into account. Hence, the procedure would be the usual procedure in these situations. First, we estimate the parameter of the distribution θ bŷ θ and then consider the distribution with the estimated parameters fθ. Next, we use fθ instead off .). Other interesting results about kernel distributions can be found in [38,39].
Then, the estimated canonical vectors, based on the RP with tuning parameter τ can be computed as where Σ 11 and Σ 22 are the empirical estimators of the variance-covariance matrices of X and Y, respectively. We next establish the consistency of the estimated canonical vectors under some regularity conditions. That is, we will prove that the estimated canonical vectors a τ n , b τ n converge for large sample sizes to the true canonical vectors defining the underlying functional relationship. For such a result, it is necessary to present the following lemma whose proof can be found in [40].
replications of the multidimensional random variables (X, Y). Consider a sequence {a n } n∈N such that 0 < a n and lim n→∞ a n = 0. Assume Consider a function K of bounded variation (Consider a function g : R k → R, and let P be the set of finite partitions of R k in rectangles p = {[x j , y j ), j = 1, . . ., u p }. Then, g is said to be of bounded variation if sup p∈P uniformly continuous in b and y, and f UV (a T x, b T y) is uniformly continuous in a, x, b and y. Then, Note that the Gaussian kernel functions defined in Equations (6)-(8) satisfy the conditions of Lemma 1. Of course, any other election of the kernel should also satisfy these regularity conditions. Now, let us define for any real value b > 0 the set of indices such that the observations a T x i and b T y i , i = 1, . . ., n have positive densities and denote by n b the number of data outside this set. The next result establishes the consistency of the RPCCA.  a, b).
Further, assume that the maximum (a * , b * ) is unique. Then, Proof. Take 0 < , 0 < b such that −→ By identification, we can assume a T n Σ 11 a n = b T n Σ 22 b n = 1. Let us suppose that ( a n , b n ) P n→∞ (a * , b * ).
Hence, there exists a subsequence of {( a n , b n )} (that will be denoted by ( a n , b n ) to avoid hard notation) and (a 0 , b 0 ( a n , b n ) −→ (a 0 , b 0 ). Now, applying Lemma 1, we know that Thus, for τ > 0, Hence, for an n large enough, Here, |δ 1i |, |δ 2i |, |δ 3i | < . Remark that ln 1 T n y i ) can be written as Now, applying Equation (10), we obtain The same can be performed for the two other cases. As |δ 3i | < , it follows that On the other hand, As b −1 → 0 and n b n → 0, we conclude that The same can be performed for the two other cases. Hence, d n τ a n , b n = 1 As ln is continuous and applying the Strong Law of Large Numbers, it follows We can perform this similarly for the two other lines. We conclude that d n τ (a 0 , b 0 ) a 0 , b 0 ). On the other hand, d n τ ( a n , b n ) ≥ d n τ (a * , b * ) because ( a n , b n ) is the optimum by definition.
Taking limits, is the optimum. Hence, as (a * , b * ) is the only maximum, we conclude that (a * , b * ) = (a 0 , b 0 ), a contradiction.

Robustness
To motivate the inherent robustness property of the RPCCA procedure, we examine the behavior of the estimated divergence in Equation (4) for small values of the tuning parameter. The presented heuristic argument was first discussed in [6] for the density power divergence generalization of ICCA. Consider the estimated RP and let the tuning parameter be τ ↓ 0. Taking limits in the estimated divergence defined in Equation (4), the first term vanishes, and therefore We first study the limiting behavior of the first term, For τ ↓ 0, this term is a limit of indeterminate form (0/0). Applying L'Hôpital's rule, we obtain Now, the denominator tends to 1 when τ ↓ 0 so that Similarly, consider and consider its L'Hôpital approximation given by Note that this approximation is valid for τ closed to 0, but, for τ = 0, This implies that d n τ ( f U × f V , f UV ) can be seen as a weighted value of the empirical Kullback-Leibler divergence and the weights depend on f n Therefore, if the observations x i , y i or both are outliers, the corresponding density estimations would decrease, so the corresponding weights would not be considered as important as other data on the estimated distance, thus making Renyi's pseudodistance more robust to outliers than the Kullback-Leibler.

Testing to Determine the Number of Pairs
In this section, a dimension reduction algorithm is described for determining the number of significant pairs of canonical vectors: the non-parametric sequential test [41,42].
In the classical approach of CCA, the maximum number of pairs (a i , b i ) is determined by the greatest index j such that (a j , b j ) is the first pair satisfying ρ(a T j X, b T j Y) = 0. That is, the CCA should run until the best estimated pair leads to linear independence. A natural extension for the RPCCA formulation is then replicated in the CCA dimension reduction algorithm, but using the RP divergence as a measure of dependence.
Let us denote by d i τ the maximum value achieved at the i-th iteration, The sequence of maximums is decreasing and lower-bounded by 0, indicating independence between the estimated canonical variables, Then, a stopping criterion for the maximum number of canonical correlations is naturally determined by the testing problem If H 0 is not rejected, then all subsequent canonical variables from the i-th onward are not significantly related. Otherwise, the relation is significant, and the maximum number of significant canonical correlations is at least i. It is difficult to obtain the exact sample distribution of d i τ , but a non-parametric permutation test can be applied, as proposed in [24], for estimating the p-value of the test. Let us explain this procedure with some detail. Suppose there is a relationship between a T i X and b T i Y for some vectors a i , b i , i.e., H 0 does not hold. This means that there exists a function f such that f (a T i X) ≈ b T i Y. Our procedure will estimate vectors a i and b i and will consider some (near!) vectors a i and b i , respectively. Consequently, we expect that for the sample (X 1 , Y 1 ), . . ., (X n , Y n ), we will obtain This will translate in a large value of d n τ ( a i T X, b i T Y) and, consequently, the corre- . Now, if we consider a permutation of the data corresponding to X but maintaining the order for the data corresponding to Y, any possible relationship is destroyed because the data corresponding to X i do not correspond to individual i in the sample, so that they have nothing to do with Y i . In other words, if we denote the reordered sample for (X 1 , . . ., X n ) by (X * 1 , . . ., X * n ), it follows that for any c, d, then d n τ (c T X * , d T Y) ≈ 0 showing independence. Consequently, when the procedure looks for some vectors these values are not expected to model a strong relation (because it does not exist), so that Hence, if H 0 does not hold and a relationship between the canonical variables exists, we expect that for (almost) any permutation On the other hand, if H 0 holds, then there is independence between c T X and d T Y for any c, d. Consequently, for the best possible estimated vectors a i , b i , we will obtain When considering a permutation of the values corresponding to X, independence will arise again, and hence, in this case, we expect Of course, the number of possible permutations is n!, and this is not affordable for large values of n. Hence, we are going to consider just a subset of randomly chosen permutations. Then, if d i,w τ denotes the value of the index corresponding to the w-th randomly permuted sample, the estimated p-value of the test is given by where R denotes the number of permutations considered. Yin [12] used R = 1000 for a permutation test for ICCA. If the p-value is smaller than a certain significance level, then the null hypothesis d i τ = 0 should be rejected implying a significant relationship for the i-th canonical variables, and the process should be repeated for i + 1. Conversely, if the null hypothesis is not rejected, then we should assume that the canonical variables are independent and conclude that there are only i estimated canonical variables exhibiting significant relationships. More details about this dimension reduction method can be seen in [24].

Computational Methods
Consider X = (x 1 , . . ., x n ) and Y = (y 1 , . . ., y n ) as p × n and q × n matrices with n observations of the random variables X and Y, respectively. The estimation of the i-th pair of canonical vectors a τ i , b τ i based on the RP with tuning parameter τ is computed through the constrained maximization problem where Σ X and Σ Y are the empirical estimators of the variance-covariance matrices of the multidimensional random variables X and Y, respectively. The optimization problem constraints can be simplified by scaling the sample matrices to have zero mean and unit variance as follows: where X and Y denote the corresponding sample mean vectors. From Proposition 1, the RPCCA is invariant under such linear transformations, and, consequently, the problem constraints are transformed into For empirical covariance matrices, it may appear as disease degeneration resulting in uninvertible matrices. In those cases, we can skip the scaling transform and apply the estimation algorithm under the original restrictions.
From the transformed canonical vectors, a i and b i , the estimated canonical vectors in the original space can be easily recovered as Here, the constrained optimization is carried out iteratively using the non-linear constrained optimizer optimize from the scipy package in Python, which implements a Sequential Quadratic Programming (SQP) method. The source code for the implementation is publicly available on https://github.com/MariaJaenada/Robust-Canonical-Correlations (Github) (accessed on 29 January 2023).

Monte Carlo Simulation
We empirically examine the robustness of the RPCCA method through a Monte Carlo simulation. We consider a pair of random vectors, X = (X 1 , . . ., X 8 ) and Y = (Y 1 , Y 2 , Y 3 ), whose components satisfy a linear and a non-linear relationship of the form: The rest of the variables are independent and they are defined as follows: X 1 , X 2 , X 6 , X 7 and X 8 . They are standard normal variables. X 3 comes from a chi-square distribution with 7 degrees of freedom, X 4 follows a t-Student distribution with 5 degrees of freedom and X 5 comes from a Fisher-Snedecor distribution with 3 and 12 degrees of freedom, respectively. Finally, Y 3 comes from a t-Student with 9 degrees of freedom.
We generate a random sample of the pairs X and Y of size n = 100, and we estimate the pairs of canonical vectors a i and b i , i = 1, 2, such that the random variables U i = a T i X and To examine the performance of the RPCCA method under contamination, we randomly switch the functional relationships in Equation (14) for an ε% of the observations, with ε = 5, 10, 15 and 20 denoting the contamination proportion. That is, for a random ε% of the observations, the values of Y 1 and Y 2 are exchanged, generating orthogonal outliers; the functions defining the Y 2 and Y 1 are orthogonal to each other. Therefore, this contamination will worsen both relationships at the same time in orthogonal directions. We repeat the simulations over R = 500 replications and compute averages of the following performance measures: We quantify the accuracy of the estimates with the absolute correlations between the estimated and true canonical variables, |ρ(a i , a i )| = |ρ(a T i X, a T i X)| and |ρ( Additionally, to evaluate the robustness of the method, we compute the L 2 -norm between the canonical vectors fitted under uncontaminated and contaminated data, a and a c , L 2 ( a, a c ) = || a − a c || 2 as well as the projection of a c into the orthogonal subspace spanned by the uncontaminated estimate, a, P 2 ( a, a c ) = ||(I − a a T ) a T c || 2 . The distance measures L 2 ( a, a c ) and P 2 ( a, a c ) are smaller the more stable the estimate is, implying that the estimates are not largely affected by the contamination; hence the corresponding method is more robust. Summarizing, the correlations between true and estimated canonical variables ρ(·, ·) aim to represent the accuracy of the method, whereas the distance measures between estimated canonical vectors for pure data and for contaminated data, L 2 and P 2 , aim to represent the robustness of the method.
Tables 1 and 2 present all performance measures for the RPCCA method over a grid of tuning parameters ranging from 0 (corresponding to ICCA) to 0.8. All methods perform suitably well in terms of accuracy, achieving high absolute correlations between true and estimated canonical variables, even under contaminated scenarios. However, the linear relationship in the first component is captured worse by the ICCA in the presence of contamination, as shown by the lower absolute correlations between the canonical variables, ρ( a c 1 , b c 1 ). Moreover, the RPCCA method with positive values of the tuning parameter produces more stable estimations of the canonical vectors, having smaller P 2 and L 2 distances between the uncontaminated and contaminated estimated canonical vectors in both components, , thus demonstrating the advantage in terms of robustness. Although the differences in performance are not impressive, the gain in robustness with very little loss of accuracy with respect to the ICCA makes the RPCCA very attractive.
On the other hand, if the underlying relationship is easily identified, the proposed robust RPCCA performs as good as the ICCA under pure data and outperforms the ICCA in the presence of contamination (Table 1). However, for τ > 0, the loss in accuracy in the relationship identification under pure data would be unavoidable (although not very significant); hence, the tuning parameter should be chosen sufficiently close to zero (from the literature, less than 1) to provide an adequate compromise between efficiency loss and robustness gain. Moderate values of the tuning parameter, around 0.3, offer the best compromise producing canonical estimators that are robust against data contamination with a small loss of efficiency with respect to the ICCA in the absence of contamination.

Real Data Application
We finally illustrate the applicability of our method with real-life data on the heredity of head shape in men. For such a purpose, we use a well-known dataset from Frets [43] that collects the head length and head breadth for the first and second sons for n = 25 families. Then, the first and second set of variables, X and Y, respectively, have the dimension 2 and represent the head length and head breadth of the corresponding son. From the dataset, we want to analyze whether there is a relationship between the head shape among male offspring. The data have been widely used in the literature, and Mardia et al. [2] and Yin [12] analyzed the canonical correlations between the first and second sons' head shapes using CCA and ICCA, respectively. In their analyses, they found one significant pair of canonical variables with a strong linear relationship. Figure 2 shows the plots of the first (left) and second (right) pair of canonical variables for the head data estimated by RPCCA with τ = 0 (top) and τ = 0.5 (bottom). As shown, both methods coincide on the first pair of canonical variables (estimate the same observations for the first pair of canonical variables), x 1 and y 1 , having linear correlation coefficients of ρ = 78.67% (τ = 0) and ρ = 76.86% (τ = 0.5) as illustrated on the corresponding plots. For the second pair of canonical variables, none of the methods find any clear functional relationship between linear combinations of the variables, and the two procedures considered estimate very different canonical variables without a clear functional relationship between them (as shown in Figure 2). Thus, we also conclude that there is only one pair of canonical variables. Additionally, to illustrate the advantage of our method in terms of robustness (with a small loss of deficiency), we contaminate a single observation (obs. 24) in both vector variables, generating an outlying observation. Then, we apply RPCCA at τ = 0 (corresponding to ICCA) and τ = 0.5 with the uncontaminated and the contaminated data. Table 3 presents P 2 and N 2 distances between the first pair of canonical vectors (identifying the linear relationship) estimated under uncontaminated and contaminated data, with only one outlying observation. Because the sample size is small, an outlying observation heavily influences the ICCA estimation, whereas the RPCCA method with τ = 0.5 shows a great stability in the canonical vector estimation. These results illustrate the advantage of the RPCCA in real-life applications, producing robust estimates of the canonical variables with a small loss of efficiency with respect to the ICCA estimation in the absence of data contamination.

Conclusions
We have presented a robust generalization of the ICCA based on RP for identifying linear and non-linear relationships between two sets of variables. We have derived sample versions for estimating the canonical vectors in practice, and we have demonstrated the consistency of such estimators. Further, the robustness advantage of the RPCCA has been examined theoretically and empirically, concluding that the proposed RPCCA offers an appealing alternative to ICCA, competitive in terms of estimation accuracy and more robust against data contamination. The method manages to detect hidden functional relationships between linear combinations of the variables and suitably approximates the true underlying relationships, even under contaminated scenarios. Moreover, a permutation test for determining the number of significant pairs of canonical vectors is presented. Since the RPCCA is a parametric family, a data-driven algorithm for determining optimal values of the tuning parameter is a worthwhile pursuit for future research. Also, the methodology presented here can be extended in future works for identifying relationships between more than two sets of variables. The idea is to consider not only two random vectors but k random vectors as considered in [24] and to look for the linear combinations in all of them so that the RP between the marginal distributions and the whole distribution is as large as possible.
Funding: This work was partially supported by the Spanish Grants PID2021-124933NB-I00 and FPU/018240.

Institutional Review Board Statement: Not applicable.
Data Availability Statement: Not applicable.