1. Introduction
Ruben Gabriel, in his seminal paper of 1971, introduced the concept of a
Biplot, which is a graphical representation of matrices that combines the visualization of rows and columns using points and vectors, respectively [
1]. This method allows the visualization of relationships between different variables and observations in a data matrix, offering a powerful tool for
principal component analysis (PCA). The
Biplot facilitates the identification of patterns, groups, and relationships within the data, providing a visual interpretation that complements traditional analytical techniques. Its implementation has enabled the better understanding and communication of complex structures in multivariate datasets [
2].
As shown in
Figure 1, Gabriel built upon the previous work of several key authors in the fields of Mathematics and Statistics. The
singular value decomposition (SVD) of matrices, which is fundamental to the
Biplot, dates back to Eckart and Young, who developed principal transformations for non-Hermitian matrices [
3]. Moreover, Gabriel references the work of Householder and Young, who addressed matrix approximation and latent roots, laying the groundwork for matrix factorization techniques used in his method [
4]. These studies provided the mathematical framework necessary for the development of Gabriel’s Biplots.
The influence of Rao is also notable in Gabriel’s work. Rao, in [
5,
6,
7], developed advanced statistical methods for biometric research and linear statistical inference, including the use and interpretation of principal component analysis in applied research. These methods and principles were essential for Gabriel to formulate and apply the
Biplot effectively, adapting these techniques to enhance the visualization and analysis of multivariate data. The application of these principles allowed Gabriel to create a more accessible and practical tool for researchers.
Another significant author mentioned by Gabriel is Good [
8], who explored applications of singular value decomposition in matrices, providing a deeper understanding of the relationships between matrix elements. Good’s ideas on matrix interpretation and complex data structure directly influenced Gabriel’s ability to visualize
high-rank matrices using Biplots. Gabriel acknowledges the influence of authors such as Hill [
9] and Bennett [
10], who explored large correlation matrices and independent parameters in score matrices, respectively. These studies provided a background on the importance of visualizing complex data structures and how these visualizations can reveal hidden patterns and meaningful relationships. The integration of these ideas allowed Gabriel to consolidate his
Biplot method, creating a versatile and widely applicable tool for multivariate data analysis [
8,
9,
10].
In [
1], Gabriel developed an innovative mathematical methodology to represent matrices through the
Biplot. Based on the premise that any matrix
of rank
r can be factorized into two
smaller-dimensional matrices, he proposed the following factorization:
where
is an
matrix and
is an
matrix. This factorization is
not unique and can be written for each element
of matrix
as the inner product of the corresponding vectors:
where
is the
i-th row of
and
is the
j-th row of
. To approximate matrices of rank higher than two, Gabriel used the singular value decomposition (SVD), which is expressed as shown below:
where
represents the singular values,
represents the singular vectors of the columns and
represents the singular vectors of the rows. a special case of this is the rank-two approximation of a matrix
, which is given as shown below:
The
goodness of fit of this approximation is measured by the following.
Additionally, Gabriel proposed two types of factorizations for Biplots: one that satisfies the condition
(
JK-Biplot) and another that satisfies
(
GH-Biplot), where
is the
identity matrix of order 2. In the former, relationships between rows are directly represented by the
g vectors:
In the latter, relationships between columns are represented by the
h vectors:
For matrices of rank higher than two, the approximation via SVD and selection of principal components allows for the graphical representation of multivariate relationships in an understandable manner. In his analysis, Gabriel also introduced the concept of specific metrics to make the factorization and
Biplot unique except for rotations and reflections that do not change the relationships between vectors, such as the following metric:
One must choose
such that
This mathematical implementation allows Biplots to be used in a wide range of applications in multivariate data analysis, improving both the visual and analytical interpretation of complex data structures.
The
GH-Biplot and
JK-Biplot are two methods used for the graphical representation of multivariate matrices each with a specific focus on the
quality of representation. The
GH-Biplot focuses on obtaining the maximum representation quality for variables, which is crucial in applications such as genomic data analysis, market research, and quality control in manufacturing. In contrast, the
JK-Biplot seeks to best represent rows or individuals, being useful in customer segmentation, student performance analysis, and user behavior analysis. Both methods share the common goal of providing a clear and comprehensible visual representation of the data, but they differ in their main focus [
1].
While the GH-Biplot is applied in contexts where the precise representation of variables is critical, the JK-Biplot is better suited for analyzing and comparing individuals or observations. Despite these differences, both methods are widely applicable in multivariate data analysis, providing valuable tools for the visualization and understanding of complex data.
For matrices
of observations of
n units in
m variables, it is essential to subtract the mean of each variable to center the data before proceeding with the analysis. Gabriel uses the estimated
variance–covariance matrix
to capture the variability and the linear relationships between the variables. This matrix is defined as
where
is the centered data matrix,
is the transpose of
, and
n is the number of observations. The matrix
provides a measure of how the variables co-vary, which is essential for multivariate analysis. The standardized distances between units
i and
e are calculated to understand the relative differences between observations in multivariate space. This
distance is defined as
where
and
are row vectors corresponding to units
i and
e, and
is the inverse of the variance–covariance matrix. This formula allows measuring the similarity between observations, adjusting for the variance and correlations among variables.
In a
Biplot, standardized distances can be approximated to simplify the graphical representation. Gabriel proposes the following approximation:
where
and
are vectors that represent the observations in the reduced space of the
Biplot. This approximation facilitates visual interpretation, allowing for a direct comparison of the distances between points in the
Biplot. The
Biplot also allows for the visual representation of the variances and covariances of the variables. These equations enable the
Biplot to visually represent the statistical relationships between variables, facilitating the interpretation of variance, covariance, and correlation in the data [
11]. The vectors
h in the
Biplot indicate how the variables co-vary and provide an intuitive way to visualize associations between them. This is especially useful in identifying patterns and relationships in complex data, making the
Biplot a valuable tool for multivariate analysis.
Despite the fact that Gabriel’s Biplots are powerful tools for the visualization of multivariate data, they have a crucial limitation: they only allow for the maximum quality of representation for either variables or individuals
but not both simultaneously. This detail was identified by Galindo in 1986 [
12], who developed the
HJ-Biplot technique. This technique addresses this limitation by maximizing the quality of representation for both variables and individuals in the same plane or low-dimensional space. This innovation allows for a more complete and balanced interpretation of the data, facilitating a more robust and detailed analysis of multivariate structures.
The
HJ-Biplot technique shares similar principles with Benzecri’s
correspondence analysis (CA) [
13] and
multiple correspondence analysis (MCA) but with extended versatility that allows its application to any type of data, not just frequencies [
12]. As shown in
Figure 2, the
HJ-Biplot is based on singular value decomposition (SVD) and allows projecting both individuals and variables in the same space, providing a graphical representation that facilitates the interpretation of multivariate relationships.
Unlike CA and MCA, which are primarily designed for categorical data and contingency tables, the
HJ-Biplot can handle continuous, categorical, and mixed data. This flexibility makes it particularly useful in studies where variables of different types must be analyzed simultaneously [
14].
The HJ-Biplot stands out for its ability to provide a symmetric representation of rows and columns, ensuring that the quality of representation is optimal for both clouds. This contrasts with other Biplots where the quality of representation is not the same for rows and columns. The technique maximizes the quality of the representation using appropriate metrics in the row and column spaces, allowing for the accurate interpretation of relative positions and interrelations.
Additionally, the HJ-Biplot can incorporate external information, such as additional categorical variables, to enrich the analysis and provide deeper context. This capability makes it a powerful tool for exploring and visualizing complex data, facilitating the identification of hidden patterns and trends.
Given a data matrix X, with dimensions , where n is the number of individuals and p is the number of variables, this matrix contains the observations of each individual for each variable. It is common to center the matrix X to eliminate the effect of different average levels of the variables. This is accomplished by subtracting the mean of each column (variable) from each of the elements in that column, thus obtaining the centered matrix , where is a column vector of ones of length n and is a vector of means of length p.
The centered matrix is decomposed using SVD. This decomposition is a generalization of eigenvalue and eigenvector decomposition for non-square matrices. The SVD of is expressed as , where is an orthogonal matrix with dimensions , whose column vectors are the left singular vectors; is a diagonal matrix of dimensions with the singular values of on its diagonal; and is an orthogonal matrix with dimensions , whose column vectors are the right singular vectors. In practice, the first k singular values that explain the majority of the variability in the data are selected. Thus, the first k columns of and and the first k singular values of are selected.
The coordinates of the individuals in the reduced space are obtained by multiplying by , resulting in . Similarly, the coordinates of the variables are obtained by multiplying by , obtaining . In the HJ-Biplot, both individuals (rows) and variables (columns) are represented in the same two-dimensional or three-dimensional space, if applicable. The coordinates and are used to plot these points on a graph. The interpretation of the relative positions of the points is key in the HJ-Biplot: the proximity between points of individuals indicates similarities between them, while the orientation and length of the vectors of the variables indicate the direction and magnitude of their influence on the data.
The orthogonal projection of an individual’s point onto a variable vector approximates the value of the individual on that variable. The
quality of the representation in the
HJ-Biplot can be evaluated using the proportion of the variability explained by the selected singular values. The sum of the squares of the selected singular values, divided by the sum of the squares of all singular values, provides a measure of the quality of the representation. This quality is expressed as follows:
where
represents the singular eigenvalues. The
HJ-Biplot also allows the incorporation of external information to enrich the analysis. This can be accomplished by including additional categorical variables that were not in the original dataset. These additional variables are projected into the same space, providing additional context for interpretation [
15]. Interpretation of the
HJ-Biplot involves examining the relative positions of the individuals and variables on the graph. Some key questions include the following. Which individuals are close to each other, and what does this mean in terms of the variables? Which variables are strongly correlated as indicated by nearby vectors? Which individuals are most influenced by specific variables as reflected in the proximity of their projections on the graph?
Table 1 below evaluates three
Biplot techniques:
GH-Biplot,
JK-Biplot, and
HJ-Biplot, in terms of their
global goodness of fit, for rows and columns. Each technique is analyzed using specific mathematical formulas that measure the proportion of the total variability explained. For the
GH-Biplot, the global goodness of fit is calculated as the sum of the squares of the first
k singular values (
) divided by the sum of the squares of all singular values. This global measure provides an overview of the representation quality. The goodness of fit for rows in the
GH-Biplot is simplified as
, where
r is the number of retained dimensions, while for the columns, the proportion of variability explained by the first two singular values is used over the total.
The
JK-Biplot, on the other hand, uses the same formula for global goodness of fit as the
GH-Biplot—that is, the proportion of total variability explained by the first
k singular values. However, it differs in how it calculates the goodness of fit for rows and columns. For the rows, the
JK-Biplot employs the following formula:
which implies a more accurate representation based on the first two singular values. For the columns, the technique simplifies the goodness of fit to
, which is similar to what the
GH-Biplot does for the rows. This difference in the formulas reflects an alternative approach to data representation and adjustment.
The
HJ-Biplot is notable for its symmetry in the representation of rows and columns. It uses the same formula for global
goodness of fit as the other two techniques:
ensuring a consistent measure of explained variability. For both rows and columns, the
HJ-Biplot uses (
1), which ensures high-quality representation for both rows and columns. This symmetry is a significant advantage of the
HJ-Biplot, providing a balanced and accurate representation of multivariate data, making it a robust and versatile technique for various types of analyses.
To address the practical implementation of the
HJ-Biplot technique, it is relevant to mention that its development and application have been supported by the creation of various software tools. These tools have enabled deeper analysis and interactive visualizations, facilitating the interpretation of multivariate data. In this context,
Table 2 presents some of the programs and packages specifically developed for the study and application of the
HJ-Biplot, highlighting their evolution and adaptation to different platforms and research needs. This technological advancement has been key to the dissemination and use of this technique in various fields, allowing researchers to effectively explore and communicate the complex structures present in their data.
As shown in
Table 2, various programs and packages have been developed to support the
HJ-Biplot technique, providing tools for its implementation and visualization. These include
R packages such as
BPCA and
MultibiplotGUI as well as more recent applications like
SparseBiplots and
LDABiplots. These programs allow for an in-depth analysis and the generation of interactive graphics that facilitate data interpretation, reflecting the growing importance of the
HJ-Biplot in research and professional practice.
Each software tool implementing the
HJ-Biplot technique offers unique advantages and limitations, making them suitable for different research contexts. For instance,
MultBiplot [
16] is widely recognized for its user-friendly interface and robust handling of structured datasets, making it an excellent choice for researchers new to multivariate analysis. However, its performance can be limited when dealing with very large datasets or high-dimensional data. Another relevant software is
BPCA (Biplot based on principal components analysis) [
17], which allows the creation of 2D and 3D Biplots using the
HJ-Biplot method based on principal components. On the other hand,
SparseBiplots [
23] addresses these limitations by incorporating penalization methods like LASSO and Elastic Net, which are particularly effective for reducing dimensionality and improving interpretability in genomics and big data applications. Despite its advanced capabilities,
SparseBiplots requires a deeper understanding of penalization techniques, which may pose a challenge for less experienced users.
Similarly,
PyBiplots [
22] stands out for its compatibility with
Python, which is a programming language widely used in machine learning and data science. This makes it an attractive option for researchers looking to integrate
HJ-Biplot analysis into broader machine learning workflows. However, the lack of extensive documentation and community support for
PyBiplots can be a drawback for users unfamiliar with
Python. Meanwhile,
GGEBiplotGUI [
18] is specifically designed for agricultural research, offering specialized tools for analyzing genotype-by-environment interactions. While it excels in this niche, its applicability to other fields may be limited. These examples highlight the importance of selecting the right tool based on the specific requirements of the research project, balancing ease of use, functionality, and compatibility with existing workflows.
In this context, the present systematic review and meta-analysis seek to provide a comprehensive synthesis of the studies utilizing the HJ-Biplot technique from its inception in 1986 to December 2024. The sources of these studies include the Scopus and Web of Science databases as well as the repository of master’s and doctoral research at the Department of Statistics of the University of Salamanca. This analysis aims to explore the evolution and application of the HJ-Biplot technique in diverse disciplines, identify the most recurrent areas of application, and evaluate its contribution to data analysis and representation in multivariate scenarios.
To ensure methodological rigor, the PRISMA 2020 framework has been followed, encompassing aspects such as the clear definition of objectives, systematic search of relevant literature, data extraction, and critical appraisal of the included studies. This systematic review aims not only to highlight the theoretical and practical contributions of the HJ-Biplot technique but also to serve as a basis for future research, expanding its applicability and integration with emerging technologies such as artificial intelligence and machine learning.
3. A New Seven-Phase Methodology for Systematic Literature Review
3.1. Phase 1: Article Screening
In this initial phase, we present the results of an extensive textual analysis of 121 articles on the
HJ-Biplot technique by Galindo, obtained from a search in the Web of Science (WOS) and Scopus databases, which yielded 204 initial results. This was conducted using the following search equation:
“HJ-Biplot” OR “HJ Biplot” OR “HJBiplot” OR “HJ biplot” OR “HJ-biplot” OR “GH-Biplot” OR “JK-Biplot” OR “GH Biplot” OR “JK Biplot” OR “GH biplot” OR “JK biplot” OR “GHbiplot” OR “JKbiplot”, covering variants and related techniques. As shown in
Figure 4, after applying the
PRISMA 2020 protocol to assess the relevance and quality of the articles [
68], 12 were excluded for incomplete information. Subsequently, 78 duplicate articles were removed, resulting in a base of 114 articles: 83 from WOS and 31 from Scopus.
In addition to the initial search in WOS and Scopus, further research was conducted using the University of Salamanca databases. This step allowed the inclusion of 14 additional articles that were not captured by the WOS or Scopus search engines. Among these, the inclusion of Galindo’s seminal 1986 article, a key study for the development of the HJ-Biplot technique, stood out. With this addition, the total number of articles considered rose to 128.
In the final screening phase, an evaluation of the eligibility of the articles was carried out to ensure that they met the established inclusion criteria. During this review, it was identified that 6 articles from WOS and 1 article from Scopus did not meet the quality or relevance criteria, leading to their exclusion. As a result, 77 articles from WOS, 30 from Scopus, and 14 additional articles not captured by the search engines were selected, giving a final total of 121 articles for analysis.
Figure 5 represents a co-occurrence analysis of terms related to the use of the
HJ-Biplot and its applications in various research areas. The larger nodes, such as “hj-biplot”, “multivariate analysis”, and “human”, indicate central terms that have been frequently mentioned together in the analyzed texts, suggesting their relevance in the literature. The colors differentiate thematic groups; for example, the green group is associated with sustainability and sustainable development topics, while the red group is related to risk factors and gender violence. Terms such as “sustainability”, “corporate social responsibility”, and “cluster analysis” also highlight its application in ecological, business, and social studies. The connections between the nodes suggest how these themes are interrelated, reflecting a multidisciplinary evolution of the
HJ-Biplot in diverse fields such as health, the environment, and social sciences.
3.2. Phase 2: Data Preparation
Once the eligible articles were obtained, they were coded for processing in the
IRAMUTEQ software, as shown in
Figure 6. For the coding, an additional column was added to the
Excel file containing the articles, where a specific code was inserted that concatenated an article identifier with its abstract. This code was designed to be placed later in a plain text file in
UTF-8 format, which would serve as the corpus for processing by
IRAMUTEQ.
The coding process followed the standards specified by IRAMUTEQ, including the insertion of four asterisks to indicate variables and modalities. This structure in Excel not only organized the data systematically but also ensured that each article was correctly identified and prepared for analysis in the software. The format used in the coding is essential for IRAMUTEQ to properly recognize the variables and modalities of each text, allowing for efficient and accurate analysis of the textual corpus.
3.3. Phase 3: Data Loading and Processing with the IRAMUTEQ Software
Once the articles were coded and stored in a plain text file, the corpus was loaded into IRAMUTEQ for analysis. The corpus must meet specific structural criteria, including UTF-8 encoding, text segmentation, and the selection of dictionaries and lemmatization options. Lemmatization is crucial for standardizing linguistic forms, transforming verbs into their infinitive form, nouns into singular, and adjectives into masculine, thus facilitating the detection of patterns and relationships in the texts. This process ensures that the analysis is carried out consistently and accurately, providing a faithful representation of the content of the articles.
After configuring and validating the corpus, the text indexing process was initiated, which can take several minutes depending on the size of the corpus. This process is essential for preparing the corpus for the various analyses that IRAMUTEQ offers, such as textual statistics, similarity analysis, hierarchical classification, and the generation of word clouds. Proper coding and preparation of the corpus are crucial to obtaining representative and accurate results, allowing valuable conclusions to be drawn about Galindo’s HJ-Biplot technique from the selected articles.
With the validated corpus, correspondence factor analysis (CFA) was performed. CFA allows for the exploration of relationships between rows and columns in a contingency table. In our study, correspondence factor analysis (CFA) was applied to identify co-occurrence patterns among the words used in the abstracts of articles about the HJ-Biplot. This analysis helps visualize clusters of frequently occurring terms and their proximity to certain articles, making it easier to understand how certain concepts are interrelated in the literature on this technique.
CFA also detects specificities within the corpus, identifying terms characteristic of certain subgroups. This is particularly useful when analyzing texts from different eras or research areas, as it allows us to observe how certain terms or topics are associated with specific contexts. Upon completing the CFA, as seen in
Figure 7, a
lexical matrix is obtained that represents the frequencies of key terms in each article, revealing important patterns in the corpus and providing a detailed view of the evolution and distribution of topics related to the
HJ-Biplot.
3.4. Phase 4: Characterization of the Lexical Matrix with the R Software
The obtained lexical matrix consists of 265 variables (words) and 121 individuals (article identifiers), resulting in a size of
. This matrix was saved in CSV format. For the creation of the
Canonical Biplot, it is necessary to transpose the matrix and apply a
characterization factor, as suggested in [
69]. This factor is calculated using the following formula:
which adjusts the matrix values based on the maximum of each row
i and column
j, allowing for proper characterization of the data. The process is carried out in the
R software
version 4.4.1 [
70], starting with the transposition of the matrix, which is followed by applying the formula to adjust the values. These partial results are shown in
Figure 8. Finally, the results are saved in an
Excel file, ensuring the correct association between the article identifiers and their corresponding data. This procedure optimizes the data matrix for the
Canonical Biplot analysis, ensuring accurate visualization and meaningful interpretation.
3.5. Phase 5: One-Way Canonical Biplot Analysis with the MultBiplot Software
Based on the characterized matrix, an additional column with the publication year of each article and another column labeled “group” were added. In this latter column, four publication period groups were established, which were distributed as follows:
Group 1 (1986–1999),
Group 2 (2000–2007),
Group 3 (2008–2015), and
Group 4 (2016–2024). These groups are essential for performing the one-way
Canonical Biplot analysis, as they allow segmenting publications according to the mentioned periods. The data were then loaded into the
MultBiplot version 18.0312 software [
16], developed at the University of Salamanca by the Applied Multivariate Analysis research group, to conduct the corresponding
MANOVA-Biplot, which will allow us to analyze the variability between the groups and associations with the lexical variables in the different periods.
As shown in
Table 3, the result of the
Canonical Biplot shows the main dimensions, their eigenvalues, and the variance explained by each of these dimensions. In this case, the first dimension has an eigenvalue of 16.07, explaining 48.39% of the total variance of the model. The second dimension has an eigenvalue of 13.04, explaining 31.85% of the variance, while the third dimension, with an eigenvalue of 10.27, explains the remaining 19.76% of the variance.
Another important result is the
quality of the representation of group means on the
canonical axes.
Table 4 shows how the different publication periods are projected onto the two main axes of the analysis. We observe that Group 4 (2016–2024) is very well represented, with a value of 877 on Axis 1 and 104 on Axis 2, yielding a cumulative value of 981. This indicates that most of the variability of this group is captured in the first axis, with a fairly high representation also in the second axis, suggesting that this group has a strong presence in the analyzed dimensions.
On the other hand, Group 2 (2000–2007) shows the lowest representation, with a value of 227 on Axis 1 and 251 on Axis 2, accumulating a total of 478. This low cumulative value indicates that this group is not well represented on the main axes of the Biplot, suggesting that the characteristics of this period do not align as clearly with the primary directions of variability in the data. In contrast, Group 1 (1986–1999) and Group 3 (2008–2015) show more balanced representations, with cumulative values of 827 and 975, respectively, highlighting that these groups have a good projection on at least one of the main axes, although each shows different focal points of representation with Group 1 better represented on Axis 1 and Group 3 on Axis 2.
The
Canonical Biplot graph in
Figure 9 shows that Axis 1 explains 48.39% of the variance, and Axis 2 explains 31.85%, meaning that together they represent 80.24% of the variability in the data. The publication period groups are distributed according to techniques derived from the
HJ-Biplot [
12]. The period 2016–2024 is associated with more recent techniques, such as
Cenet HJ-Biplot [
71] and
Sparse HJ-Biplot [
72], indicating a diversification in the development of these techniques. In contrast, the period 1986–1999 reflects the beginning of the
HJ-Biplot, primarily associated with its original version, while the period 2008–2015 is linked to techniques such as the
Bootstrapping HJ-Biplot [
73] and
Dynamic Biplot [
74], suggesting an interest in adding robustness and dynamism to the analysis.
In addition to the technical evolution, the Biplot also reveals how the applications of the HJ-Biplot have changed over different periods. The period 2016–2024 shows a strong relationship with contemporary topics such as the impact of COVID-19, food security, and environmental issues with a significant geographical focus on Latin America. On the other hand, the period 2008–2015 is associated with terms such as sustainability, agriculture, and water resources, indicating the use of these techniques to address global sustainability issues. The period 1986–1999 shows a more academic focus with applications of the HJ-Biplot in survival studies and university environments.
The period 2000–2007 is linked to terms such as crime, pollution, and gender issues, suggesting that during this stage, HJ-Biplot-derived techniques began to be applied in social and environmental studies, addressing humanitarian and social issues. This marks a transition toward greater diversification in the applications of the technique, expanding to socially relevant topics and covering a broader spectrum of contemporary issues.
Although the Canonical Biplot has proven to be a useful tool for visualizing the evolution and applications of the HJ-Biplot over time, its ability to interpret words frequently can be limited. This is where artificial intelligence, such as ChatGPT, comes into play by providing a deeper contextual analysis, allowing for the better interpretation of the words and topics within the texts. By complementing the Biplot results with the capabilities of ChatGPT, a clearer and more precise understanding of the relationships and patterns present in the texts is achieved, significantly enriching the textual analysis.
3.6. Phase 6: Use of ChatGPT 3.5 Artificial Intelligence for Textual Analysis
To add an additional layer of information to the systematic literature review on the HJ-Biplot, artificial intelligence, specifically ChatGPT 3.5, was employed as a powerful and complementary tool. This tool not only enables the analysis of large volumes of text but also contextualizes and extracts relevant patterns that may not be evident through traditional text mining methods. The goal was to use ChatGPT to process the abstracts of articles from each previously defined study period, extracting the most relevant information in chronological order and focusing on identifying key articles, derivative techniques of the HJ-Biplot, and the applications and evolution of these studies.
To carry out this process, specific prompts were designed in
ChatGPT 3.5, as shown in
Table 5. These prompts aimed to extract chronological information, identify seminal articles, and analyze the evolution and applications of the
HJ-Biplot. This integrative approach provided an additional layer of analysis, enriching the understanding of the data collected through the
Canonical Biplot.
By inputting the article summaries from each period along with these prompts, ChatGPT 3.5 provided an additional layer of analysis, enabling not only a clearer synthesis of findings but also the identification of trends and evolutionary patterns. This facilitated the understanding of how the HJ-Biplot has been adapted and applied in different contexts and how research on this technique has progressed over time. The information extracted by ChatGPT 3.5 was carefully cross-referenced with the data obtained through the Canonical Biplot, ensuring that the derived conclusions were both accurate and comprehensive.
After executing these prompts in
ChatGPT 3.5 and loading the articles in groups of 10 with their publication year, authors, title, and abstract, a comprehensive set of results was generated. These results, which include key findings, chronological trends, and the evolution of
HJ-Biplot applications, are summarized in
Table 6. This table provides a detailed overview of the most relevant insights extracted by
ChatGPT 3.5, which were organized by year and highlighting significant milestones in the development and application of the
HJ-Biplot technique. The information represents a critical component of the analysis, offering a structured and accessible synthesis of the
AI-generated data.
The integration of
ChatGPT 3.5 into the systematic literature review on the
HJ-Biplot proved to be a transformative approach, enabling a deeper and more nuanced understanding of the technique’s evolution and applications. By leveraging AI-generated insights, this study not only identified key trends and seminal works but also uncovered patterns that traditional methods might have overlooked. The results presented in
Table 6 represent only a portion of the extensive information extracted by
ChatGPT 3.5, which includes detailed chronological analyses, key article identification, and the evolution of
HJ-Biplot applications across various fields. However, these AI-generated insights must be carefully cross-referenced with the findings from the
Canonical Biplot and text mining analyses, which were conducted as preliminary steps in this study. This triangulation of methods ensures the robustness and reliability of the conclusions, as it combines the strengths of AI-driven textual analysis with the precision of multivariate visualization and the depth of lexicometric exploration. This innovative methodology underscores the potential of combining advanced AI tools with traditional analytical techniques to enhance research outcomes, paving the way for future interdisciplinary studies that can address complex, multidimensional problems with greater precision and insight.
3.7. Phase 7: Consolidation of Information and Creation of the Timeline of the HJ-Biplot Technique Development
The
HJ-Biplot, introduced by Galindo in 1986, was a significant advancement in multivariate analysis, enabling the simultaneous representation of data matrices with superior fitting for both rows and columns in the same reference system [
12]. Following its introduction, the technique was applied in various fields. Orfao in [
75] used it to define characteristics of large lymphocyte populations in B-CLL cases. In 1991, Galante explored the spatial distribution of dung-feeding scarabs in Mediterranean pasturelands [
76], while Santos et al. differentiated young wines by anthocyanin profiles, showcasing the method’s ability for classification and discrimination [
77]. Rivas et al. (1993) expanded its use to classify wines geographically based on phenolic and chemical variables [
78]. Additionally, Meder et al. (1994) applied the
HJ-Biplot to distinguish grape varieties by anthocyanin composition, demonstrating its robustness in agricultural and food sciences [
79].
In 1996, Galindo advanced the application of the
HJ-Biplot with his study
“Comparative Study of the Ordering of Ecological Communities Based on Factorial Techniques” [
80], demonstrating its effectiveness in analyzing ecological communities compared to other multivariate techniques. In 1999, Galindo applied the method to aquatic ecosystems in
“HJ-Biplot analysis as a tool for studying an aquatic ecosystem” [
81], emphasizing its utility in interpreting ecosystem indices, while Garcia-Talegon et al. extended its use to geology, analyzing the origin and evolution of building stones based on chemical composition [
82]. The versatility of the
HJ-Biplot was further showcased in 2005 when Alarcón et al. identified personality clusters among Chilean adolescents in
“Personality Styles and Social Maladaptation During Adolescence” [
83], and Iñigo et al. monitored the geological evolution of building stones through chemical analysis [
84].
In 2006, Cabrera et al. employed the technique to study air pollution in Salamanca over a five-year period, capturing pollutant relationships and temporal evolution [
85], while in 2007, Alcantara and Rivas analyzed political polarization in Latin America using the
HJ-Biplot to reveal ideological dimensions [
86], and Galindo examined socioeconomic profiles of women in undeclared employment in Salamanca [
87]. In 2008, Celestino and Gonzalez applied the
HJ-Biplot to identify socioeconomic profiles of female micro-entrepreneurs in Mexico, combining it with cluster analysis for regional insights [
88], and in 2009, Castela used the method to analyze electoral turnout patterns in Portugal, while Mendez explored bacterioplankton dynamics in the Berlengas Archipelago, showcasing the method’s adaptability to diverse scientific fields [
89,
90].
In 2010, Marreiros utilized the
HJ-Biplot to classify public hospitals in Portugal based on clinical record quality and funding relationships [
91]. By 2012, Serafim et al. applied it to assess environmental contamination risks in mothers from southern Portugal by analyzing placental biomarkers [
92]. In 2013, the method proved versatile in bibliometric studies, analyzing greenhouse gas emissions in international companies and grouping them by geographical regions, as demonstrated by Diaz-Faes et al., Martinez-Ferrero, and Gallego-Alvarez [
93,
94]. That same year, Torres-Salinas et al. highlighted its use in bibliometric and scientific indicator analyses [
95]. In 2014, Felicio examined the connection between governance mechanisms and the performance of publicly traded companies, while Gallego-Alvarez et al. applied the
HJ-Biplot to study corporate social responsibility in Brazilian firms and the environmental performance of 149 countries, identifying socioeconomic and institutional factors as key determinants [
96,
97,
98].
In 2014, the
HJ-Biplot was applied to diverse areas: Hernandez et al. studied the impact of agricultural techniques and harvesting periods on tomato quality [
99]; Herrera Ramírez et al. evaluated the nutritional requirements of tropical tree species for urban forestry [
100]; and Caballero et al. integrated qualitative and quantitative methods to analyze focus group discussions, offering a novel mathematical characterization of discourse [
69]. Additionally, Morillo et al. assessed the performance of research networking centers in psychiatry and gastroenterology [
101]. By 2015, Delgado and Galindo introduced a spatiotemporal traffic matrix analysis method using
HJ-Biplot, demonstrating improved temporal and spatial correlation representation over PCA [
102], while Egido proposed the
Dynamic Biplot to analyze economic freedom evolution in the EU, identifying greater freedom in non-eurozone countries [
74]. Ferreira et al. explored soil erodibility variations influenced by land use [
103], and Gallego-Alvarez et al. analyzed global Sustainable Society Index disparities, correlating them with geographical differences [
104].
In 2015, the
HJ-Biplot advanced with the introduction of the
Bootstrap HJ- Biplot by Nieto-Librero, incorporating bootstrap confidence intervals and validation through simulated and real data [
73], and its application by Ortas et al. in analyzing sustainability performance across companies in various countries [
105]. That same year, Patino-Alonso et al. identified lifestyle clusters linked to cardio-metabolic health [
106], while in 2016, Cadavid-Ruiz et al. used the technique to study executive function in children [
107], and Suarez et al. explored bioactive compounds in tomatoes with the innovative
Compositional HJ-Biplot [
108]. By 2017, applications expanded further: Alende and García employed the
HJ-Biplot for analyzing trends in preventive journalism [
109], Amor et al. linked cultural values to CSR practices [
110], and Nieto-Librero et al. developed the
Clustering Disjoint HJ-Biplot to classify pollution patterns [
111]. Additionally, Tejedor-Flores et al. combined it with MuSIASEM to examine energy consumption in Ecuador, showcasing the method’s utility in sustainability studies [
112].
In 2018, the
HJ-Biplot was extensively applied across diverse fields, with Amor et al. analyzing sustainability behaviors influenced by mimetic forces and legal systems in CSR practices [
113,
114], while Cubilla-Montilla et al. explored the role of cultural values in improving corporate transparency on human and labor rights issues [
115]. Fernandes et al. used the method to address innovation challenges in Portugal [
116], and Gallego-Alvarez et al. evaluated environmental performance in Latin America [
117]. The technique also revealed climatic and soil impacts on grape composition in Rioja appellation [
118] and improved heterogeneity analysis in diagnostic test meta-analyses [
119]. It demonstrated its advantages in studying executive functions in Colombian children [
120] and disaggregating agricultural data [
121]. In 2019, applications included water quality evaluation [
122], emotional and intelligence aspects in digital education [
123], and CSR policy analysis [
124]. Other studies quantified the health impacts of air pollution in Ecuador [
125] and examined innovation in Portuguese start-ups [
126], showcasing the
HJ-Biplot’s adaptability and effectiveness.
In the last few years, the
HJ-Biplot has demonstrated remarkable versatility, being applied across diverse fields. Studies have explored educational strategies [
127], analyzed CO
2 emissions and family business succession challenges [
128,
129], and examined correlations between environmental performance, e-government, and corruption [
130]. It has been employed to study arterial hypertension risks, corporate sustainability indicators, and innovative behaviors in Ecuadorian universities [
131,
132,
133]. Applications include analyzing gastric cancer risks linked to Helicobacter pylori in Ecuador [
134], innovative time-series analysis techniques [
135], and assessing the Chilean economy’s focus on common good development [
136]. In recent years, the
HJ-Biplot has seen innovative advancements and diverse applications. Alvarez and Griffin introduced the
GH-Biplot to address multicollinearity in multivariate regression, while Cubilla-Montilla developed the
Sparse HJ-Biplot for large datasets, and Martinez-Regalado integrated the
HJ-Biplot with machine learning for CSR analysis [
72,
137,
138]. Studies explored economic autonomy among Latin American women, academic performance in Chile, and strategic management in higher education [
139,
140,
141]. During the COVID-19 pandemic, researchers applied the
HJ-Biplot to assess health risks, vaccine strategies, and related conditions, highlighting its utility in managing public health crises [
142]. Other works analyzed pension systems in Latin America, food supply impacts on non-communicable diseases in Ecuador, and neuroendocrine tumor survival rates, showcasing the technique’s adaptability in addressing societal and health challenges [
143,
144,
145].
In recent studies, the
HJ-Biplot has demonstrated its versatility in addressing diverse challenges. Pilacuan-Bonete et al. integrated it with the Latent Dirichlet Allocation model to analyze COVID-19-related digital news, while Ramiro Miranda et al. evaluated the dietary intake of polyphenols in postpartum women, linking it to nutritional profiles [
146,
147]. Applications also included assessing COVID-19 vaccination progress in the Americas and Europe as well as introducing disjoint biplots for sustainability analysis [
148,
149]. Ruiz-Toledo et al. examined the positioning of Latin American universities in global rankings, correlating variables like funding and scientific output, while Torres García et al. analyzed neuropsychological impacts and depression in gender violence victims using this method [
150,
151]. In 2023, Cano et al. applied the
HJ-Biplot to model radiological content in construction materials, Crespo et al. linked crime rates to socioeconomic factors in Ecuador, and Ferreira et al. combined the technique with machine learning to study driving behavior and emissions, further showcasing its adaptability in multivariate analyses [
152,
153,
154].
Recent advancements highlight the
HJ-Biplot’s continued evolution and applications. Gonzalez-Garcia et al. introduced the
Cenet HJ-Biplot, combining restricted singular value decomposition and elastic net penalization for the improved representation of high- and low-dimensional matrices [
71]. Studies also explored the pandemic’s impact on tourism and SDGs globally, analyzing corporate commitment to sustainability post-COVID and trends in agroforestry research [
155,
156,
157,
158]. In 2024, Almorza et al. applied the
HJ-Biplot to maritime inspections, while Duran-Ospina used it in bibliometric studies on keratomycosis [
159,
160]. Ramos-Veintimilla analyzed genetic families of
Juglans neotropica, emphasizing its value in forest genetics [
161]. Saez-Lopez examined gamification in education, revealing limited teacher adoption, and Silva and Freitas enhanced time-series analysis with the
SSA HJ-Biplot [
162,
163]. Applications extended to environmental analysis, from Ecuador’s water quality to the environmental impact of food products, offering critical insights for sustainability [
164,
165].
The chart provided in
Figure 10 represents an analysis of the different areas where the seminal
HJ-Biplot article has been applied, distributing a total of 121 articles across various disciplines. The largest application is found in the field of
Health, with 18 articles, representing approximately 14.9% of the total. This is followed by the areas of
Sustainability with 14 articles (11.6%),
Environmental Sciences with 15 articles (12.4%), and the seminal article and its extensions with 10 articles (8.3%). Additionally, other areas like
Management (9 articles, 7.4%),
Education (8 articles, 6.6%), and
Economics (7 articles, 5.8%) also show a notable level of application for this technique. Other applications include more specific disciplines such as
Bibliometrics (6 articles, 5.0%),
Psychology (6 articles, 5.0%),
Agronomy (5 articles, 4.1%), and
Oenology (4 articles, 3.3%). The areas with the fewest publications include Criminology, Politics, Journalism, Data Networks, Neuromedicine, and Silviculture, each with fewer than 4 articles, representing a smaller percentage of the total. This analysis indicates a diversification in the applications of
HJ-Biplot with a significant trend toward its use in health, sustainability, and environmental topics.
Table 7 presents a summary of the different extensions of the
HJ-Biplot technique over the years. It starts in 1986 with the original development of the
HJ-Biplot in [
12]. From 2015 onwards, the emergence of several extensions can be observed, such as the
Dynamic Biplot [
74] and the
Bootstrap HJ-Biplot in [
73], which were followed by other significant variants such as the
Compositional HJ-Biplot [
108],
Clustering Disjoint HJ-Biplot [
111],
SSA HJ-Biplot [
135],
Sparse HJ-Biplot [
72],
Cenet HJ-Biplot [
71], and finally the
ESSA HJ-Biplot in [
163].
Figure 11 shows a detailed timeline of the development of the
HJ-Biplot technique, manually created, from its inception in 1986 by Galindo to its most recent extensions and applications in 2024. The top and bottom parts of the graph detail the various applications of the
HJ-Biplot and its extensions in different areas of knowledge over time. Since its creation, the
HJ-Biplot has found applications in areas such as
Environmental Sciences,
Health,
Oenology,
Bibliometrics,
Economics, and
Management. As the extensions of the technique are developed, an increase in the variety and specificity of application areas is observed. This graph highlights how the use of the
HJ-Biplot has evolved to include a multidisciplinary approach, covering both traditional applications and innovative fields in emerging areas. Additionally, it not only highlights the chronological evolution of the
HJ-Biplot and its extensions but also the diversification of its applications, demonstrating its continued relevance and expansion in academic and applied research over nearly four decades.
4. Discussion
The
HJ-Biplot has proven to be a powerful and versatile tool in the analysis of multivariate data with notable applications in fields such as health, sustainability, and socioeconomic studies. In the field of health, for example, its ability to identify complex patterns and relationships has enabled significant advances in the identification of risk profiles and the design of personalized interventions. A notable study is [
106], which used the
HJ-Biplot to analyze lifestyle clusters associated with cardiometabolic risks, providing valuable insights for the prevention of chronic diseases.
In the field of sustainability, the
HJ-Biplot has been employed in various applications that analyze environmental, economic, and social indicators. For example, in [
104], this technique was applied to evaluate the
Sustainable Society Index, providing a clear visual representation of countries with the best performance in sustainability. In [
110], the
HJ-Biplot was used to analyze the relationship between corporate social responsibility and financial performance in European companies, highlighting how sustainable practices can positively influence economic outcomes. Another relevant example is [
117], where the
HJ-Biplot was applied to assess the impact of environmental policies on greenhouse gas emissions reduction in different regions of the world, identifying the most effective strategies to combat climate change. These studies demonstrate the versatility of the
HJ-Biplot in addressing complex problems in the field of sustainability, providing valuable insights for decision making in public policies and corporate strategies.
In socioeconomic studies, the
HJ-Biplot has been used to analyze survey data and employment profiles, as in the study [
87] on women in irregular employment situations in Salamanca. However, its application in fields such as engineering and technology is less common, despite its potential to address complex problems in these areas. For example, the
HJ-Biplot could be used to analyze sensor data in autonomous vehicles or to optimize processes in smart mining, where the identification of patterns and relationships could improve efficiency and decision making. These applications represent opportunities to expand the use of the
HJ-Biplot in interdisciplinary and emerging contexts.
An innovative aspect of this work is the combination of text mining techniques and artificial intelligence to analyze large volumes of information. By using tools such as IRAMUTEQ and ChatGPT 3.5, key insights were extracted and synthesized from 121 studies, identifying emerging trends and application areas that might have gone unnoticed with traditional methods. For example, the context provided by ChatGPT 3.5 enabled the generation of a detailed timeline summarizing the evolution of the HJ-Biplot from its introduction in 1986 to the most recent innovations, such as the Sparse HJ-Biplot and the Cenet HJ-Biplot. Additionally, text mining facilitated the identification of thematic patterns in the abstracts of the articles, enriching the analysis and providing a more comprehensive view of the contributions and limitations of the technique. This combination of methods not only enhanced the context of the information obtained but also demonstrated the potential of artificial intelligence to complement and enhance data analysis in systematic reviews.
Despite its wide applicability, the HJ-Biplot faces several challenges that limit its use in certain contexts. One of the main challenges is the integration of heterogeneous data, such as the combination of genomic and clinical data in health studies. Although extensions like the Sparse HJ-Biplot have proven useful for dimensionality reduction in these cases, their interpretation remains complex and requires validation by experts in the field. Another challenge is scalability in large datasets, where techniques such as deep learning could complement the HJ-Biplot to improve the efficiency and accuracy of the analysis. Additionally, the lack of software tools that integrate the HJ-Biplot with artificial intelligence techniques limits its applicability in emerging fields such as data mining and process optimization.
A key aspect in the future development of the HJ-Biplot is the complementarity between its different extensions, such as the Sparse HJ-Biplot and the Cenet HJ-Biplot. The Sparse HJ-Biplot excels in applications requiring dimensionality reduction, such as in genomic data analysis, where its ability to identify key variables is invaluable. On the other hand, the Cenet HJ-Biplot offers robustness and flexibility in handling high- and low-dimensionality matrices, being particularly useful in the evaluation of public policies, where it allows for a precise representation of relationships between variables. These complementary characteristics open the door to future developments that integrate both techniques, as well as the exploration of innovative approaches, such as the use of functional data or disjoint techniques, which could further expand the capabilities of the HJ-Biplot to address complex problems in various fields. The combination of these extensions and the incorporation of new methodologies promise to enrich multivariate analysis, offering more versatile and adaptable tools to the current needs of research.
To overcome these limitations and fully leverage the potential of the HJ-Biplot, several directions for future research are proposed. First, it is crucial to explore the integration of the HJ-Biplot with machine learning techniques, such as penalized principal component analysis (Sparse PCA) and neural networks, to improve its scalability and applicability in large datasets. Second, complementary studies between the different extensions of the HJ-Biplot, such as the Sparse HJ-Biplot and the Cenet HJ-Biplot, are recommended to identify their advantages and limitations in different contexts. Finally, the development of software tools that integrate the HJ-Biplot with artificial intelligence techniques would facilitate its use in interdisciplinary and big data applications, thereby expanding its reach and relevance in current research.
The HJ-Biplot is an invaluable tool for multivariate data analysis with demonstrated applications in multiple disciplines. However, its full potential has yet to be realized, and it is necessary to address challenges such as the integration of heterogeneous data, scalability in large datasets, and the lack of advanced software tools. By overcoming these limitations, the HJ-Biplot could establish itself as an indispensable tool in data analysis, offering innovative solutions to complex problems in research and professional practice.
5. Conclusions
Through an exhaustive analysis of the literature employing Galindo’s HJ-Biplot technique, 121 articles published from 1986 to the present were identified, allowing the mapping of the evolution and diversification of this technique over time. The reviewed studies reveal both significant theoretical contributions and applications in diverse and high-impact areas, such as Health, Environment, Sustainability, Management, and Economics. These fields highlight the versatility of the HJ-Biplot in addressing complex and multidimensional issues. Furthermore, it was observed that Spain, Ecuador, Portugal, Colombia, and Chile lead in publication volume, underscoring the relevance of these regions in the development and application of the technique. This geographical focus also suggests possible collaborations and areas for expansion in future studies.
The integration of artificial intelligence through the use of ChatGPT 3.5 added an extra layer of analysis to text mining, specifically with the Canonical Biplot, enriching the context of the data and facilitating the identification of patterns and trends in the reviewed literature. Artificial intelligence proved to be a valuable resource by automating classification processes and semantic analysis, optimizing time, and enabling deeper processing of large volumes of information. However, it is essential to recognize that the use of these tools is not without limitations. The results largely depend on the approach and initial parameters set by the researcher, which could introduce biases or partial interpretations depending on the methodological decisions made at each stage of the analysis.
Although artificial intelligence has shown great potential in processing and analyzing textual data, it is crucial that the generated data be interpreted with caution. The dependence of AI on learning models trained with prior information may limit its objectivity and, in some cases, introduce automated interpretations that do not necessarily capture the complexity or specific context of the studies analyzed. This underscores the need for researchers to complement AI-generated results with a critical and contextualized review, minimizing potential biases and achieving a more comprehensive and balanced interpretation of the information.
This study, structured into seven methodological phases, provides a robust framework for conducting systematic literature reviews using the HJ-Biplot technique. The proposed methodology facilitates the organization and analysis of a high volume of articles, enabling a comprehensive view of the technique’s evolution and applications. Despite the density and complexity of the data analyzed, the methodological approach implemented in this work simplifies the identification of key trends and contributions in the field, establishing itself as a valuable tool for researchers interested in multivariate analysis and text mining. Additionally, this methodological structure can serve as a replicable model for other studies aiming to thoroughly analyze the development of techniques or methodologies within specific research areas.
This study not only documents the historical development and applications of the HJ-Biplot technique but also provides practical recommendations for researchers, particularly those less familiar with this methodology. To make the most of the HJ-Biplot, it is suggested to start with manageable datasets where its ability to visualize multivariate relationships can provide clear and actionable insights. However, it is important to acknowledge its limitations, such as difficulties in handling synonyms, polysemy, or highly dimensional data, which may require the use of extensions like Sparse HJ-Biplot or integration with machine learning techniques.
The hybridization of HJ-Biplot with artificial intelligence tools, such as automated semantic analysis, represents a significant opportunity to overcome these limitations. For example, the use of natural language processing (NLP) models could improve the interpretation of complex texts, while dimensionality reduction algorithms, such as penalized PCA (Sparse PCA), could optimize the analysis of large datasets. For researchers looking to explore emerging fields, HJ-Biplot is recommended as a complementary tool in interdisciplinary studies, where the visualization of multidimensional data can enrich analysis and decision making. Finally, researchers are encouraged to explore interregional collaborations and expand the use of HJ-Biplot in less conventional contexts, such as urban sustainability, digital health, or process engineering. By adopting a practical approach and being mindful of its limitations, HJ-Biplot can establish itself as an accessible and powerful tool for addressing complex problems in current research.