Sparse HJ Biplot: A New Methodology via Elastic Net

: The HJ biplot is a multivariate analysis technique that allows us to represent both individuals and variables in a space of reduced dimensions. To adapt this approach to massive datasets, it is necessary to implement new techniques that are capable of reducing the dimensionality of the data and improving interpretation. Because of this, we propose a modern approach to obtaining the HJ biplot called the elastic net HJ biplot, which applies the elastic net penalty to improve the interpretation of the results. It is a novel algorithm in the sense that it is the ﬁrst attempt within the biplot family in which regularisation methods are used to obtain modiﬁed loadings to optimise the results. As a complement to the proposed method, and to give practical support to it, a package has been developed in the R language called SparseBiplots. This package ﬁlls a gap that exists in the context of the HJ biplot through penalized techniques since in addition to the elastic net, it also includes the ridge and lasso to obtain the HJ biplot. To complete the study, a practical comparison is made with the standard HJ biplot and the disjoint biplot, and some results common to these methods are analysed.


Introduction
Recently, the variety and the rapid growth of datasets have led to an increase in the amount of information in many disciplines and fields of study. Due to this increase in the volume of data, a statistical approach based on dimension reduction is an essential tool to project the original data onto a subspace of lower dimensions, in such a way that it is possible to capture most of the variability. This representation can be approximated by applying multivariate techniques including principal component analysis (PCA).
PCA has its origins in works developed by [1], but more substantial development was carried out by [1,2]. A more current reference can be found in [3]. PCA is traditionally undertaken through the singular value decomposition (SVD) technique of [4].
In PCA, data are projected on orthogonal axes of maximum variability in a space of reduced dimensions, usually a plane. Thus, each principal component is a linear combination of the initial variables and their contribution to each component is established. The coefficients of the combinations, which are called loadings and are usually different from zero, generate the main drawback of the PCA: its interpretation.
Several alternatives have been proposed to improve the interpretation of the results, ranging from rotation techniques to the imposition of restrictions on factor loadings. Initially, Hausman [5] proposed restricting the value that can be assigned to loads of the The main objective is to propose a new biplot methodology called the Sparse HJ biplot as an alternative method to improve the interpretation of information provided by high-dimensionality data. The suggested algorithm adapts the restrictions to penalise the loadings and produce sparse components; that is, each component is a combination of only the relevant variables. In addition to the proposed method, a package was implemented in the R language to give practical support to the new algorithm.
The paper is organised into the following sections. In Section 2.1, descriptions of the approach proposed by [27] and HJ biplot [28] are presented and their properties are synthesised. Next, Section 2.2 describes the disjoint biplot (DBiplot) [29], a recent adaptation to the study of disjoint axes in the HJ biplot. Section 3 presents our main contribution, a method called Sparse HJ biplot, in which the elastic net penalty is applied to obtain zero loadings. Finally, in Section 4 this method is applied to a dataset and compared with results obtained by other methods.

Materials and Methods
In multivariate analysis when you have the interest of representing variables and individuals in the same coordinate system, biplots methods allow obtaining visually interpretable results. Although biplot methods and classical methods of multivariate analysis work with two-way matrices, the need to integrate techniques for manipulating large volumes of data is necessary for many scientific disciplines. The advance of science requires the design of advanced techniques to simplify the analysis of the data. In the case of Biplot methods, the literature only reports the disjoint biplot technique (described in Section 2.2), to produce axes with zero factorial loadings that facilitate the interpretation of massive data. Therefore, to analyze this type of data, in this paper an innovative multivariate technique, named "sparse HJ-Biplot", was developed. This is a new alternative of HJ-Biplot adapted to the analysis of large data sets, that raises the interpretation of the results. The mathematical structure of the algorithm is described in Section 2.3.
To illustrate the techniques used in this paper, we take a sample from data published by The Cancer Genome Atlas [30]. The data subset used is "breast.TCGA" and it was downloaded from the mixOmics R language package [31]. It has been selected the proteomics information which includes 150 breast cancer samples and the expression or abundance of omics data in 142 proteins. These samples are classified into three groups: Basal, Her2, Luminal A.

Biplot and HJ-Biplot
The biplot methods [27] are a form of low-dimensional graphic representation of a multivariate data matrix (individuals x variables). The two most important biplot factorisations presented by [27], were the GH biplot and the JK biplot. The GH biplot achieves a high-quality representation of the variables, while the JK biplot achieves this high quality in the ranks of individuals. Consequently, the biplot representation is not simultaneous for either of the two methods.
An alternative to optimise the biplot methods described by [27,28] proposed a multivariate technique called the HJ biplot. This contribution maximises the representation quality of both rows and columns simultaneously [32] for the same coordinate system. In this way, it is possible to interpret the relationships between individuals and variables at the same time.
From the algebraic point of view, the biplot methods are based on the same principles of PCA and SVD of a data matrix. In a biplot, the data are projected onto orthogonal axes of maximum variability in a space of reduced dimension. The HJ biplot facilitates the interpretation of the positions of rows, columns and row-column relations through the axes, in the same way as correspondence analysis [33][34][35][36].
In the HJ biplot, the algorithm starts with the decomposition into eigenvalues and eigenvectors of the matrix X nxp defined previously: where, X: is the data matrix U: is the matrix of data whose columns contain the eigenvectors of XX T V: is the matrix of data whose columns contain the eigenvectors of X T X D: is the diagonal matrix containing the eigenvalues of X U and V must be orthonormal, that is, U T U = I and V T V = I, to guarantee the uniqueness of the factorisation.
The best approximation to X is, therefore: Considering that de X matrix is centered, the markers for the columns in the HJ biplot are matched with the marked columns of the GH biplot; in turn, the markers for the rows are made to agree with the marker's rows of the JK biplot. Thus, E = UD and G = VD In the same manner, the row markers in the HJ biplot correspond with the row markers of JK biplot, and the markers for the columns coincide with the markers for the columns in a GH biplot, concerning the factorial axes. Figure 1 shows the relationships between U and V.
Mathematics 2021, 9, 1298 4 of maximum variability in a space of reduced dimension. The HJ biplot facilitates th terpretation of the positions of rows, columns and row-column relations through the a in the same way as correspondence analysis [33][34][35][36].
In the HJ biplot, the algorithm starts with the decomposition into eigenvalues eigenvectors of the matrix defined previously: ≅ where, : is the data matrix : is the matrix of data whose columns contain the eigenvectors of : is the matrix of data whose columns contain the eigenvectors of : is the diagonal matrix containing the eigenvalues of and must be orthonormal, that is, = and = , to guarantee the uni ness of the factorisation.
The best approximation to is, therefore: Considering that de X matrix is centered, the markers for the columns in the HJ b are matched with the marked columns of the GH biplot; in turn, the markers for the r are made to agree with the marker's rows of the JK biplot. Thus,

E = and G =
In the same manner, the row markers in the HJ biplot correspond with the row m ers of JK biplot, and the markers for the columns coincide with the markers for the umns in a GH biplot, concerning the factorial axes. Figure 1 shows the relationship tween and . The rules for the interpretation of the HJ biplot are: • The proximity between the points that represent the row markers is interprete the similarity between them. Consequently, nearby points allow the identificatio clusters of individuals with similar profiles.

•
The standard deviation of a variable can be estimated by the module of the ve which represents it.

•
Correlations between variables can be captured from the angles between vecto two variables are correlated, they will have an acute angle; if the angle they for obtuse the variables will present a negative correlation; and, if the angle is a r angle it indicates that the variables are not correlated.

•
The points orthogonally projected onto a variable approximates the position o sample values in that variable. The rules for the interpretation of the HJ biplot are: • The proximity between the points that represent the row markers is interpreted as the similarity between them. Consequently, nearby points allow the identification of clusters of individuals with similar profiles.

•
The standard deviation of a variable can be estimated by the module of the vector which represents it.

•
Correlations between variables can be captured from the angles between vectors. If two variables are correlated, they will have an acute angle; if the angle they form is obtuse the variables will present a negative correlation; and, if the angle is a right angle it indicates that the variables are not correlated.

•
The points orthogonally projected onto a variable approximates the position of the sample values in that variable.

Disjoint HJ Biplot
An algorithm for HJ biplot representation was proposed by [29,37]. The DBiplot allows for better interpretation of the extracted factorial axes and is a method that constructs disjoint factor axes guaranteeing that each variable of the original data matrix contributes only to one dimension. The algorithm starts with a random classification of the variables in the principal components, and the optimal classification is sought through an iterative procedure leading to maximisation of the explained variability.
The graphic representation in the subspace of reduced dimension is carried out through the HJ biplot. For this, a function called CDBiplot is hosted within the graphic interface biplotbootGUI [37]. The interface has three main functions. The CDBiplot function executes the interface to build the DBiplot. Using this interface, it is also possible to perform the representation of clusters, using the clustering biplot function (CBiplot) and a third alternative called clustering disjoint biplot, which allows the simultaneous representation of two ways, where the latter is characterised by disjoint axes.
Concerning the biplot, we have not found any other evidence for the formulation of alternative algorithms that penalise and contract the loadings of factorial axes to enhance the interpretation of the results. Because of this, we propose a new methodology for the HJ biplot that adapts the restrictions on the principal components applying the elastic net penalty. This new approach is called the Sparse HJ biplot.

Sparse HJ Biplot
The elastic net method for regression analysis, which combines the ridge and lasso regularisation techniques was presented by [14]. This method penalises the size of the regression coefficients based on the l 1 and l 2 norms, as follows: The Sparse HJ biplot proposes a solution to the problem of obtaining a linear combination of variables determined by a vector of sparse loadings that maximises data variability or minimises construction error. We use the concept of minimisation of the reconstruction error (E): Based on this approach, the ability to interpret the axes (sparse) obtained is greatly improved. In this work, the elastic net regularisation method is implemented in the HJ biplot, combining the lasso and ridge techniques. The formulation of the HJ biplot as a regression problem imposes restrictions on factorial loadings to produce modified axes.
Since the HJ biplot does not reproduce the starting data, a factor is introduced to make this recovery possible. The following model is obtained: From the elastic net regularisation method, modified loadings for the biplot are derived as follows: Using the first k factorial axes, the matrices are defined: For any λ 2 > 0, we have: Mathematics 2021, 9, 1298 6 of 15 subject to A T A = I KxK , where λ 1,j is the lasso penalty parameter to induce sparsity (Sparsity: the condition of the penalty that refers to the automatic selection of variables, setting sufficiently small coefficients to null). λ 2 is the regularization parameter to contract the loadings. ||·|| 2 denotes the l 2 norm and ||·|| 1 denotes the l 1 norm. This problem can be solved by alternating the optimisation between A and B, using the LARS-EN algorithm [14].
For fixed A, B is obtained by solving the following problem: where eachβ j is an elastic net estimator. For fixed B, the penalty part is ignored and minimized This a Procrustes problem, and the solution is provided by DVS, (X T X)B=UDV T withÂ = UV T .
Recently [38] proposed a solution to the optimisation problem using the variable projection method. The main characteristic of this method is to partially decrease the orthogonally constrained variables.
The steps for the implementation of the elastic net regularisation method in the HJ biplot are detailed in the following algorithm (see Algorithm 1). Algorithm 1 Sparse HJ biplot algorithm using elastic net regularisation.
1. Consider a nxp data matrix. 2. A tolerance value is set (1 × 10 −5 ). 3. The data is transformed (centred or standardised). 4. Decomposition of the original data matrix is performed via SVD. 5. A is taken as the loadings of the first k components V[, 1:k]. 6. β j is calculated by: A is updated via SVD of X T Xβ: The difference between A and B is updated: Steps 4, 5 and 6 are repeated until di f AB < tolerance. 10. The columns are normalized usingV EN 11. We then calculate the row markers and column markers. 12. The elastic net HJ biplot obtained by the previous steps is plotted.
The following scheme ( Figure 2) presents the steps describing the application of the regularisation methods in the HJ biplot, which leads to modified axes being obtained. ics 2021, 9, 1298 7 of 16 Explained variance by the Sparse HJ biplot.
In the HJ biplot, orthogonal loadings and uncorrelated axes are obtained from the transformation of the original variables. Both conditions are met, since for a covariance matrix D = then = and is a diagonal matrix. Taking ̂ as the estimated sparse PCs, the total explained variance is determined by tr(̂̂).
Although sparse PCs are capable of producing orthogonal loadings, their components are correlated [12]. Under these conditions, it is not appropriate to calculate the explained variance in the same way as in an ordinary biplot, since the true variance would be overestimated.
In this analysis, the aim is that each component should be independent of the previous ones; therefore, if linear dependence exists, it must be eliminated. From the alternatives that are available to obtain the adjusted total variance and give a solution to this problem in sparse components, [13] suggest using projection vectors to remove linear dependence. These authors denote ̂• 1,…, −1 as the residual of fitting ̂ for ̂1 ,…,̂−1 , as follows: Therefore, the adjusted variance of ̂ is ‖̂• 1,…, −1 ‖ 2 and the total explained variance is determined as ∑ ‖̂• 1,…, −1 ‖ 2 =1 . In the case of the estimated sparse components ̂ are uncorrelated, this formula matches tr(̂̂).
The decomposition, in which is an orthonormal matrix and R an upper triangular matrix, is a simpler way to estimate the adjusted variance. Taking ̂= we have ‖̂• 1,…, −1 ‖ 2 = 2 .
The total explained variance is then ∑ 2 =1 .

Software
To give practical support to the new algorithm and to provide a new tool that enables the use of the proposed method, a package was developed in the R programming language (R Core Team, 2021), called SparseBiplots [39]. Explained variance by the Sparse HJ biplot. In the HJ biplot, orthogonal loadings and uncorrelated axes are obtained from the transformation of the original variables. Both conditions are met, since for a covariance matrixD = X T X then V T V = I and V T DV is a diagonal matrix. TakingẐ as the estimated sparse PCs, the total explained variance is determined by tr Ẑ TẐ .
Although sparse PCs are capable of producing orthogonal loadings, their components are correlated [12]. Under these conditions, it is not appropriate to calculate the explained variance in the same way as in an ordinary biplot, since the true variance would be overestimated.
In this analysis, the aim is that each component should be independent of the previous ones; therefore, if linear dependence exists, it must be eliminated. From the alternatives that are available to obtain the adjusted total variance and give a solution to this problem in sparse components, [13] suggest using projection vectors to remove linear dependence. These authors denoteẐ j·1,...,j−1 as the residual of fittingẐ forẐ 1 , . . . ,Ẑ j−1 , as follows: where H 1,...,j−1 is the projection matrix over Ẑ i j−1 1 . Therefore, the adjusted variance ofẐ j is Ẑ j·1,...,j−1 2 and the total explained variance is determined as k ∑ j=1 Ẑ j·1,...,j−1 2 . In the case of the estimated sparse componentsẐ are uncorrelated, this formula matches tr Ẑ TẐ . The QR decomposition, in which Q is an orthonormal matrix and R an upper triangular matrix, is a simpler way to estimate the adjusted variance. TakingẐ = QR we have Ẑ j·1,...,j−1 2 = R 2 jj .
The total explained variance is then

Software
To give practical support to the new algorithm and to provide a new tool that enables the use of the proposed method, a package was developed in the R programming language (R Core Team, 2021), called SparseBiplots [39].
This package includes a collection of functions that allow multivariate data to be represented on a low dimension subspace, using the HJ biplot methodology. The package implements the HJ biplot and three new techniques to reduce the size of the data, select variables, and identify patterns. It performs the HJ biplot adapting restrictions to decrease and/or produce zero loadings, using the methods of regularization ridge, lasso, and elastic net into the loadings matrix.
In each of the techniques implemented it is possible to obtain the eigenvalues, the explained variance, the loadings matrix, the coordinates for the individuals and the coordinates of variables. The package also allows the graphic representation into two selected dimensions to be obtained using the ggplot2 grammar [40].
This package combines an advanced methodology, the power of the R programming language and the elegance of ggplot2 to help models clearly explain the results and improve the informative capacity of the data. It is a new alternative to analysing data from different sources such as medicine, chromatography, spectrometry, psychology, or others.

Illustrative Example
In order to illustrate the new algorithm and to compare the results with the standard HJ biplot and the DBiplot, we consider the subset of data explained in Section 2.
The analysis of the data begins with the calculation of the loadings matrix associated with the factorial axes of each of the three methods, to compare the contributions of the proteins to the new axes (see Table 1). In this way a technical meaning is given to each axis, facilitating the interpretation of the results. No sparsity is induced in the HJ Biplot loadings matrix. Each factorial axis is obtained as a linear combination of all the proteins, making the characterisation of each axis difficult. DBiplot, on the other hand, generates disjoint factorial axes, where each protein contributes to a single factorial axis; this facilitates the grouping of proteins with similar performance making the axes mostly characterisable. However, the fact that each protein contributes its information entirely to a single axis severely limits the variability explained by the model. In this new proposal, the sparse biplot decreases the importance of proteins that contribute the smallest information to each axis, producing sparse factor axes with some loadings as zeros, by the penalty imposed. In contrast to the DBiplot, variables can contribute information to more than one axis. It does not limit the grouping of variables in different axes, conversely produces axes that can hold some similarities, but keeps the difference in their characterisation. The three techniques were then plotted using the type of cancer as a grouping factor for the samples.
In the HJ biplot ( Figure 3) a slight pattern can be observed between the three types of cancer and the proteins, but this is not entirely clear. However, due to the high number of proteins, it is difficult to identify the contribution in each axis. The disjoint biplot was realized with the help of the biplotbootGUI package, the were analyzed by performing 1000 iterations of the algorithm obtaining the graph sh ( Figure 4). The DBiplot shows a better structure in terms of interpretation, although cancer-protein interaction is quite poor. A common characteristic of this technique is The disjoint biplot was realized with the help of the biplotbootGUI package, the data were analyzed by performing 1000 iterations of the algorithm obtaining the graph shown ( Figure 4). The DBiplot shows a better structure in terms of interpretation, although the cancer-protein interaction is quite poor. A common characteristic of this technique is that most of the variables contribute mainly to the first axis. The disjoint biplot was realized with the help of the biplotbootGUI package, the data were analyzed by performing 1000 iterations of the algorithm obtaining the graph shown ( Figure 4). The DBiplot shows a better structure in terms of interpretation, although the cancer-protein interaction is quite poor. A common characteristic of this technique is that most of the variables contribute mainly to the first axis. Finally, the SparseBiplot ( Figure 5) shows an interaction between proteins and cancer types, that is clearer and easier to interpret. It shows that the variables that contribute negatively to the second axis have higher values for cancer type Luminal A. On the contrary, the proteins that contribute positively to the second axis have a higher value for cancer type Basal, and average values in Her2 type cancers. Axis 1 is a highly informative gradient (45.8%); and collectively with axis 2 captures approximately 75% of the variability of the data. The interpretation of the SparseBiplot makes possible the recognition of a proteomic characterisation of each of the groups, starting from a set of original proteins since the rest have obtained null loadings.
negatively to the second axis have higher values for cancer type Luminal A. On the contrary, the proteins that contribute positively to the second axis have a higher value for cancer type Basal, and average values in Her2 type cancers. Axis 1 is a highly informative gradient (45.8%); and collectively with axis 2 captures approximately 75% of the variability of the data. The interpretation of the SparseBiplot makes possible the recognition of a proteomic characterisation of each of the groups, starting from a set of original proteins since the rest have obtained null loadings.

Conclusions and Discussion
The theoretical contribution of this research means excellent progress for Big Data analysis and multivariate statistical methods. In this paper, we proposed an innovative algorithm to describe samples and variables in a low-dimensional space keeping the most relevant variables. Additionally, the developed software is a valuable tool that enables applying the theoretical contribution to the data analysis in every scientific field, as suggested by [41,42].
In contrast to the current developmental dynamic of the biplot method, no reference was found to applied regularisation techniques to the biplot. Therefore, the main contribution of this paper is to propose a new biplot version with penalized loadings, called the elastic net HJ biplot, using the ridge (based on the standard 2 norm) and lasso (based on the standard 1 norm) regularisation methods [43].
The advantage of applying penalization to induce sparsity in the HJ biplot produces the loadings matrix of the resulting axis to be sparse, allowing us to omit the slightly important variables into the biplot. Consequently, the interpretability of the biplot's results becomes clear by knowing the variables that contribute the most information to each axis

Conclusions and Discussion
The theoretical contribution of this research means excellent progress for Big Data analysis and multivariate statistical methods. In this paper, we proposed an innovative algorithm to describe samples and variables in a low-dimensional space keeping the most relevant variables. Additionally, the developed software is a valuable tool that enables applying the theoretical contribution to the data analysis in every scientific field, as suggested by [41,42].
In contrast to the current developmental dynamic of the biplot method, no reference was found to applied regularisation techniques to the biplot. Therefore, the main contribution of this paper is to propose a new biplot version with penalized loadings, called the elastic net HJ biplot, using the ridge (based on the standard l 2 norm) and lasso (based on the standard l 1 norm) regularisation methods [43].
The advantage of applying penalization to induce sparsity in the HJ biplot produces the loadings matrix of the resulting axis to be sparse, allowing us to omit the slightly important variables into the biplot. Consequently, the interpretability of the biplot's results becomes clear by knowing the variables that contribute the most information to each axis in the biplot, providing efficient solutions to problems arising from the high dimension of the data [44].
A package, called SparseBiplots [39], was developed in the R programming language to perform the proposed elastic net HJ biplot algorithm. As mentioned by [45], in some study designs the sample size is smaller than the number of variables. In particular, our package has the advantage of being able to solve this type of problem in two-way tables.
The presented methodology opens a new road to future research in multivariate statistics, as it serves as the basis for it to be extended to three-way data analysis techniques, such as Statis and Statis Dual [46], partial triadic analysis [47], Tucker [48], Tucker3 [49], Parallel Factor Analysis (PARAFAC) [50], among others.