1. Introduction
The use of artificial intelligence (AI) and machine learning (ML) in the financial sector is continuously expanding and transforming, having a profound impact on industry and society [
1,
2]. From traditional financial institutions such as investment and retail banks or hedge fund management firms such as JPMorgan Chase to new players in financial technology such as Revolut, thus AI and ML are widely used to optimize operations and improve customer service [
3].
For example, JPMorgan Chase has deployed the Contract Intelligence (COiN) platform, an AI-powered system that automates the analysis of legal documents. COiN can process and extract critical data from complex contracts in seconds, saving an estimated 360,000 h of manual labor annually and significantly reducing the risk of human error [
4]. In the FinTech sector, Revolut uses machine learning algorithms to detect fraudulent behavior and protect customers from scams. The company has launched an advanced fraud detection feature that uses machine learning to identify suspicious transactions in real time and prevent financial losses [
5].
This digital transformation trend is also supported by the rapid expansion of the global AI market in the banking sector. Thus, the AI market size has been estimated at USD 19.90 billion in 2023 and increased to USD 26.23 billion in 2024, and is projected to reach around USD 315.50 billion by 2033, registering a compound annual growth rate (CAGR) of 31.83% from 2024 to 2033 [
6]. This expansion is fueled by the accelerated digitalization and modernization of the banking sector, as well as the increasing adoption of advanced technologies by financial institutions [
7,
8,
9,
10].
Artificial intelligence (AI) and machine learning (ML) have begun to play an increasingly important role in the banking sector, transforming it and significantly improving efficiency and decision-making [
11,
12,
13]. AI, which refers to the use of advanced technologies to enable systems to learn, process and make autonomous decisions, has been integrated into numerous banking operations, such as the use of machine learning algorithms for fraud detection, the automation of credit-granting processes through intelligent scoring or the deployment of virtual assistants to improve customer interaction [
14].
AI-driven technologies enable banks to streamline operations, reduce costs, and enhance risk management strategies. For instance, AI-powered chatbots provide 24/7 customer support, handling routine inquiries and allowing human agents to focus on complex issues. Additionally, AI enhances regulatory compliance by automating the monitoring of transactions to detect money laundering activities, ensuring adherence to financial regulations. In investment banking, AI-powered trading algorithms analyze market trends in real time, executing trades at optimal moments to maximize returns and minimize risks. These applications demonstrate how AI is revolutionizing banking by making operations more secure, efficient, and customer-centric [
7,
10].
In this context, machine learning (ML), a branch of AI, has become an essential tool, helping banks to analyze large volumes of data, identify patterns and predict financial market fluctuations [
15,
16]. Thus, ML is actively used for fraud prevention by detecting suspicious transactions in real time based on classification and anomaly recognition algorithms, for assessing customer creditworthiness through predictive models that analyze financial history and payment behavior, and for personalizing financial offers through recommendation systems that use clustering to tailor banking products to individual customer needs [
17,
18,
19,
20].
Over the decades, the use of artificial intelligence (AI) and machine learning (ML) technologies in the banking sector has undergone a profound transformation. Early initiatives in the 1980s, such as expert systems used for financial advice, laid the foundation for modern digitization, and the introduction of the FICO score in 1989 marked a turning point in the standardization of credit risk assessment in financial institutions around the world. The evolution of these technologies aligns with the directions highlighted in the recent literature, including the analysis by the Congressional Research Service, which highlights how AI has matured from rule-based models to advanced systems capable of supporting critical processes such as fraud prevention, risk assessment, and financial decision automation [
21]. At the same time, modern research, such as the study by Islam et al. [
22], highlights the catalytic role of global contexts, for example, the COVID-19 pandemic, in accelerating the adoption of AI and ML in key areas, including financial services, facilitating the transition to integrated digital ecosystems oriented towards the concept of Society 5.0.
Although the use of artificial intelligence (AI) and machine learning (ML) techniques offers substantial opportunities for increasing efficiency and innovation in the banking sector, the literature also highlights the existence of significant risks that require increased attention and appropriate management mechanisms. Durongkadej et al. [
23] demonstrate that AI-related incidents can directly affect the performance and reputation of financial institutions, leading to operational volatility and increased exposure to technological risks. Similarly, the analysis carried out by the European Central Bank highlights that the rapid adoption of AI can introduce systemic vulnerabilities, including risks related to opaque models, technological dependence, and potential malfunctions that can threaten macroeconomic financial stability [
24]. At the same time, the study conducted by Naveed et al. [
25] on large language models indicates a number of specific risks, such as the generation of erroneous content, algorithmic bias, and exposure to cyber attacks, elements that have a direct impact on the security of automated banking processes [
25].
First, the collection and use of large amounts of sensitive customer data raises major privacy and security concerns [
26]. AI algorithms depend on access to such data to function efficiently, and any security breach could have serious consequences, including financial losses and reputational damage for banking institutions. In this context, compliance with data protection regulations, such as the European Union’s General Data Protection Regulation (GDPR), becomes essential [
27]. Furthermore, the European Union has initiated the development of specific regulations for AI aimed at setting clear boundaries for the use of these technologies [
28].
The aim of this paper is to conduct a detailed bibliometric analysis of Artificial Intelligence (AI) and Machine Learning (ML) concepts in banking. Through this analysis, we aim to identify research trends, key developments and models used in the field. We will also examine existing gaps in previous research and highlight future research directions, proposing new approaches and insights to deepen our understanding of the impact of these technologies on efficiency and innovation in the banking sector. A bibliometric analysis is particularly relevant at this stage, given the rapid expansion and fragmentation of the literature on AI and ML in banking, which makes it necessary to systematically map research streams, identify underexplored areas, and assess how current contributions differ from or extend existing reviews. To this end, we propose a series of research questions that will be addressed throughout this paper:
RQ1: How has the publication of academic articles on the use of artificial intelligence and machine learning in the banking sector evolved according to data from Scopus and Web of Science databases?
RQ2: What are the main emerging research directions on the use of artificial intelligence and machine learning in the banking sector, as identified through keyword network analysis?
RQ3: Who are the authors with the highest scientific contributions in the field of artificial intelligence and machine learning applied to the banking sector, according to publications indexed in Scopus and Web of Science?
RQ4: Which research institutions have had the greatest impact on the development of artificial intelligence and machine learning research in the banking sector?
RQ5: How is research on artificial intelligence and machine learning in the banking sector geographically distributed and which countries have the most intense scientific activity in this field?
RQ6: Which academic journals publish the most influential research on artificial intelligence and machine learning in banking, by number of citations?
For a more comprehensive analysis, a multivariate bibliometric analysis using Principal Component Analysis (PCA) will be conducted. Principal Component Analysis (PCA) is employed in this study as a multivariate bibliometric technique to identify latent thematic structures within the keyword co-occurrence matrix. PCA, as a form of factor analysis, operates on correlations within a single dataset and does not synthesize effect sizes across empirical studies. This will provide an integrated perspective on the impact and emerging trends of artificial intelligence and machine learning in the banking sector.
Therefore, the study makes significant contributions to understanding the research landscape of Artificial Intelligence (AI) and Machine Learning (ML) in banking by integrating factor-analytic and bibliometric methodologies, providing a nuanced analysis that bridges quantitative statistical rigor with networked insights. A key contribution lies in its ability to synthesize a large and fragmented body of literature, identifying dominant themes like predictive analytics, customer interaction, and credit risk management, while also revealing emerging areas such as fintech and blockchain. This dual-layered approach not only validates existing findings but also uncovers latent trends and underexplored niches, offering a richer perspective than either method could achieve independently.
The paper is delimited into six sections:
Section 2 reviews the relevant literature;
Section 3 explains the methodology and data;
Section 4 focuses on interpreting the results;
Section 5 discusses the results; and
Section 6 concludes with the study’s conclusions, practitioners’ implications, future research, and limitations.
4. Results
This section addresses RQ1 by analyzing the evolution of academic articles published to date, aiming to highlight the trends and developmental stages of research on the use of artificial intelligence and machine learning in the banking sector. Analyzing the number of publications registered in the Web of Science (WOS) and Scopus databases during the years 2013–2025, a general trend of significant increase in the number of articles in both WOS and Scopus are observed, indicating a growing interest in the topics analyzed in this study.
In the WOS database (
Figure 2), there is a steady increase in the number of publications, with a significant jump since 2020, when the number of published articles reached 28. This is followed by a rapid increase in 2021 and 2022, when publications reached 46 and 56, respectively, reflecting a growing research interest in this sector. In 2023 and 2024, the number of published articles continued to increase, reaching 63 and 79, respectively, suggesting an expansion of the thematic focus and deepening research. This upward trend can be correlated with the rapid development of emerging technologies and their applicability in various economic domains, including the banking sector.
Regarding the Scopus database (
Figure 2), the number of publications follows a similar trend, but with lower values compared to WOS, indicating a somewhat narrower coverage of the areas covered. For example, in 2017 and 2018, the number of articles published in Scopus is significantly lower compared to WOS (1 and 5 articles, respectively), but from 2019 onwards, a gradual increase is observed, reaching 6 articles in 2020 and 19 articles in 2021. The increase continues in 2022 and 2023, with 11 and 27 publications, suggesting a wider expansion of the topic in academic research.
This topic is not so widely researched in the early years of the analyzed period, which can be explained by the fact that the integration of artificial intelligence and machine learning in the banking sector is a relatively recent trend, and the applications of these technologies have only started to be explored more intensively in recent years. Thus, interest in the topic has grown significantly as technologies have evolved and started to be deployed more frequently in industry.
4.1. Co-Compete Network Analysis of Keywords
This section addresses RQ2 by examining the emerging trends in research on the use of artificial intelligence and machine learning in the banking sector, as identified through the analysis of keyword networks from relevant scholarly works. The co-occurrence map of the keywords “artificial intelligence” and “machine learning in banking” illustrates the relationships and frequency of connections between relevant terms, highlighting key research trends and emerging areas of interest in the application of these technologies within the banking sector.
The keyword map (
Figure 3) generated from the Web of Science database underlines the central importance of the terms “artificial intelligence” and “machine learning” in the context of banking, as they are connected to both practical aspects of the field, such as performance evaluation and credit risk analysis (purple cluster), and advanced predictive methods, such as prediction, models and classification (red cluster). This highlights the use of these technologies for decision automation and process optimization.
The purple cluster indicates a direct application in credit risk management, performance analysis and the use of data-driven analytics in the banking sector. In parallel, the red cluster shows the relevance of predictive methods, where neural networks, classification and extreme machine learning play a key role in fraud detection, customer classification and bankruptcy prediction.
In terms of innovation, the blue cluster, which includes terms such as “innovation”, “technology” and “growth”, reflects an emerging synergy between AI and technological developments in banking. Likewise, the orange cluster, with a focus on ‘blockchain’ and ‘impact’, suggests the integration of blockchain technologies within AI-enabled solutions, supporting security and efficiency.
On the other hand, the keyword map (
Figure 4) generated from the Scopus map database is structured around central nodes and related subtopics, reflecting the areas of interest and emerging research directions. Dominant concepts, such as machine learning and artificial intelligence, occupy central positions in the network and are connected with most sub-themes. This centrality emphasizes the importance of these technologies in current research, both as fundamental methods and as solutions applicable in various contexts.
The network also highlights the existence of distinct thematic clusters, each with a specific role in understanding and applying these concepts. For example, the blue cluster, associated with the term ‘banking’, connects sub-themes such as ‘predictive analytics’, ‘on-line banking’ and ‘customer churns’, suggesting that machine learning plays a key role in improving banking processes and making predictions about customer behavior. Meanwhile, the red cluster, centered on “artificial intelligence”, integrates subtopics such as “learning algorithms”, “language processing” and “financial service”, indicating the extensive use of artificial intelligence algorithms in financial services and natural language processing.
Another example is the green cluster, which contains concepts such as “risk management”, “credit risks” and “decision-making”, demonstrating the application of big data and artificial intelligence methods in risk assessment and decision support in the banking sector. In addition, the brown cluster highlights themes such as “artificial intelligence technologies” and “risk early warning”, highlighting the use of these technologies in early warning systems to facilitate effective risk management.
In addition to analyzing the thematic clusters, the map also reveals strong links between the core concepts. The relationships between “machine learning” and “banking” are particularly robust, suggesting the intensive use of machine learning algorithms for optimizing banking processes. Similarly, the connection between “artificial intelligence” and “learning systems” reflects continued progress in the development of adaptive intelligent systems capable of responding effectively to complex challenges.
At the same time, the emerging trends highlighted by this network are particularly relevant for research. Sub-themes such as “customer satisfaction” and “digital banking” illustrate a growing interest in improving the customer experience through digital technologies. Also, the presence of the term “systematic literature review” connected with the various sub-themes suggests that researchers are constantly striving to analyze the existing literature in order to identify research gaps and opportunities.
And for a detailed exploration of the relationships between keywords, the analysis is based on the use of factor maps, which allow the identification of thematic clusters and significant connections. These maps provide an in-depth insight into emerging research trends and areas, facilitating an understanding of the distribution and interdependence of concepts.
The bibliometric factor analysis performed with keywords from the Web of Science database reveals the relational structure between relevant concepts, using semantic similarities to highlight the main research directions in the field of artificial intelligence and machine learning in banking. The two main dimensions, Dimension 1 (20.49%) and Dimension 2 (17.85%), together explain approximately 38.34% of the total variance in the data, providing a solid basis for thematic interpretation.
The clustering of the map (
Figure 5) terms suggests the existence of four main regions, each with relevant thematic specificities. In the top left region, we identify terms such as knowledge, growth and innovation, which reflect theoretical and conceptual concerns, suggesting studies of technological progress and innovation in the financial industry. This region is strongly connected to the bottom left area, which includes terms such as ensemble algorithms, feature selection and framework. This connection indicates a focus on the development and refinement of technical machine learning methodologies to support progress in this area.
In parallel, the top right area includes terms such as big data, systems, credit risk, fintech and strategy, which emphasize the applicability of new technologies in financial data analysis, risk management and the definition of banking strategies. This cluster clearly shows how machine learning and artificial intelligence technologies are being used to solve operational and risk analytics challenges in banking, reinforcing the links between technological and application aspects.
At the same time, the bottom right cluster highlights direct applications of machine learning algorithms in finance. Terms such as bankruptcy prediction, neural networks and classification algorithms suggest a focus on the development of advanced predictive models used to detect critical situations such as bankruptcy or risk classification of customers and banking operations. This region confirms the financial industry’s orientation towards automation and efficiency through the use of predictive technologies.
A particularly interesting element is the relatively central positioning of the terms artificial intelligence and machine learning, which emphasizes them as connecting points between the different thematic clusters. This position emphasizes the cross-cutting nature of these technologies, which integrate conceptual and methodological research as well as practical applications. Their centrality underlines the fact that AI and machine learning represent a common basis for all the themes explored in this review.
Beyond these observations, the relationships identified in the map highlight some important trends. For example, terms associated with theoretical underpinnings, such as knowledge and growth, indicate concerns about the accumulation and capitalization of knowledge, while the presence of terms such as blockchain and impact reflect interest in emerging technologies that, together with artificial intelligence, could contribute to innovations in the security and efficiency of banking operations. Also, the connections between credit risk, performance and analytics suggest that financial risk and performance measurement remain central research themes, highlighting the practical value of predictive technologies.
The map in
Figure 6, generated from the Scopus database shows two dimensions. Which explain a cumulative 22.38% of the total variance, with 14.62% attributed to Dimension 1 and 7.76% to Dimension 2, suggesting a relatively broad distribution of the analyzed themes. Dimension 1 seems to reflect an axis of emerging technologies and processes used in banking and artificial intelligence. Themes associated with this dimension, such as “digital banking”, “predictive analytics” and “blockchain”, highlight the innovative application of these technologies in the financial industry. In contrast, Dimension 2 can be interpreted as an operational and behavioral impact axis, including concepts such as “credit risks”, “behavioral research” and “accuracy assessment”, which highlight decision-making and performance evaluation aspects of AI-based applications.
The thematic distribution is characterized by a number of significant observations. Some concepts, such as “blockchain” and “5G mobile communication systems”, are located at the periphery of the diagram, indicating a relatively isolated positioning with respect to the central themes. This suggests that these emerging topics are not yet fully integrated into the prevailing body of literature. In contrast, terms such as ‘artificial intelligence’, ‘machine learning’ and ‘credit risks’ are concentrated in the core area, reflecting a strong link between these themes and their relevance in existing research. This central concentration emphasizes the importance of collaboration between advanced technologies and financial practices in the current literature.
The thematic interactions provide a deeper understanding of how these concepts are interrelated. The proximity between the terms ‘predictive analytics’ and ‘digital banking’ points to the application of predictive analytics for process optimization in digital banking, suggesting a synergy between these fields. Also, the connection between “credit risks” and “behavioral research” highlights the growing interest in the use of artificial intelligence to analyze customer behavior and manage associated risks.
The emerging trends identified in the map indicate new directions for research and development. The emergence of the concept of “5G mobile communication systems” reveals a growing interest in the integration of advanced mobile communications in financial services, offering new opportunities for the digitization of the sector. Similarly, blockchain technology is highlighted as a distinct topic, suggesting high potential in revolutionizing banking processes. These trends underline the dynamism and rapid evolution of research in this area, highlighting the opportunities offered by new technologies in reshaping the financial sector.
4.2. Authors’ Co-Citation Network
This section addresses RQ3 by identifying the authors with the highest number of published works in the field of artificial intelligence and machine learning applied to the banking sector, based on data from relevant academic sources. The analysis of authors from the Web of Science database highlights the key contributors in the field of artificial intelligence and machine learning applied to banking, emphasizing their collaboration networks and academic impact.
Figure 7 highlights the relevance of authors according to the number of papers published, providing a clear perspective on their contribution to the field. Zhang Y stands out as the author with the most publications, having a total of four papers, while Cheng D, Krabichler T, Li J, Sharma M and Wang X each have three papers published. Also, authors such as Balland PA, Basalkevych O, Beheshti A and Benatallah B contributed two papers each, showing a relatively even distribution of publications among the relevant authors. This distribution suggests a balanced collaboration in the field without an excessive concentration of publications around a small number of researchers.
And
Figure 8, is a complement to
Figure 7, which reflects the output of the authors over time, highlighting their significant contributions in the field of artificial intelligence (AI) and machine learning (ML) applied to the banking sector. The authors’ contributions are distributed over a relevant period, highlighting emerging trends and their role in advancing knowledge in the field.
Among the authors with recurring contributions are Zhang Y, Cheng D, Krabichler T, Li J, and Sharma M, whose papers are distributed consistently between 2020 and 2024. This continuity suggests sustained commitment to the topic and active involvement in applied AI/ML research in banking. Wang X also stands out with a significant volume of publications in a short time span, indicating an intensification of research activity and a possible focus on innovative aspects.
In contrast, authors such as Basalkevych O, Beheshti A and Benatallah B had punctuated contributions concentrated in specific periods. This pattern may reflect temporary involvement or research focused on limited projects, but with potentially significant impact.
The temporal evolution of academic output shows a sharp increase after 2020, coinciding with the acceleration of digitization and the increasing demand for advanced technologies in the banking sector. This trend highlights the growing relevance of AI/ML in optimizing banking processes, analyzing risks and improving customer service.
According to
Figure 9, made with keywords from the Scopus database, each author included in the graph is associated with two papers, the maximum value identified, which emphasizes a relatively equal contribution among the researchers analyzed. The most prominent authors include Ceron BM, Chen K, Gupta S, Gupta S, Irfan M, Kumar P, Mehrotra A, Monge M, Taneja S and Abd El-Aal MF, each of whom have an equal number of papers attributed.
The even distribution of publications across authors, with no clear leader dominating the field, suggests a collaborative nature of research in this thematic space. The lack of a marked concentration of scholarly output in a small number of researchers may reflect the diversity of contributions and approaches in the field analyzed. At the same time, the graph also reveals the presence of one author associated with a single paper, identified as NA NA, which may indicate incomplete or anonymous data.
Figure 10 reflects authors’ production over time, in Scopus database, gives an insight into the evolution of the scientific output of different authors over the years, reflecting the diversity of contributions and the dynamics of academic research in this field. The analysis highlights the constant involvement of authors such as Mehrotra A. and Kumar P., who demonstrate a sustained and consistent activity over several years.
This continuity emphasizes their long-term commitment and their essential contribution in advancing the field. In contrast, other authors, such as Chen K., stand out with productions that are limited in time but significant in volume, indicating a concentrated academic impact in the years of activity.
On the other hand, temporal analysis highlights variations in scholarly output. Some publications are notably concentrated in the recent range 2023–2024, suggesting an increased interest in emerging themes and topical issues. Authors such as Irfan M. and Gupta S. are distinguished by an intense pace of publications in short periods, suggesting either the conduct of a specific research project or a sharp increase in scholarly engagement within a delimited interval. This trend highlights particularities in the individual work of the authors, but also the influence of contextual factors on the pace of research.
Temporal diversity becomes evident in the case of authors with recent contributions, such as Taneja S. and Abd El-Aal MF. who seem to focus on contemporary or emerging research directions. Their involvement reflects the dynamism of the field of study and the tendency towards innovation in the context of current themes. At the same time, the presence of an anonymous author in the data, identified by the ‘NA NA’ marker, suggests possible inaccuracies or errors.
4.3. Collaborative Institutional Analysis of Co-Authors
This section explores RQ4 by examining the primary research institutions that have made significant contributions to studies on artificial intelligence and machine learning in the banking sector, as highlighted in the academic sources analyzed. The collaborative institutional analysis of co-authors reveals the key partnerships between institutions, highlighting the geographical and academic networks that drive research in artificial intelligence and machine learning within the banking sector.
Figure 11 provides a detailed perspective on the evolution of key institutions’ contributions to scientific research over the period 2015–2025, highlighting their dynamics and progress in terms of the number of published articles in Web of Science database. The analysis of this graph highlights the general trends, the performances of the top institutions, the regional contributions and the thematic alignment between them.
First, a general upward trend is observed, characterized by a sustained increase in the number of articles published by all the institutions analyzed, especially since 2017. This intensification reflects an increase in academic interest in areas such as artificial intelligence, machine learning and their financial applications. The period 2017–2025 thus marks a critical stage of research expansion, being characterized by a thematic diversification and a significant increase in the number of publications.
Among the top institutions, Kharkiv National University of Radio Electronics stands out as the leader, with a total of 13 articles published by 2025. The accelerated growth of its scientific output after 2017 highlights both a major involvement in research and a position of academic leadership in the field. Similarly, Lviv National Polytechnic University, with 11 articles in 2025, is on an upward trajectory, marking a solid contribution to the advancement of scientific knowledge. Bucharest University of Economic Studies also records a significant leap with 2020, reaching a total of 10 articles in 2025, indicating an intensified focus on topics related to digitalization and the economic impact of new technologies.
In addition, institutions such as the University of Electronic Science and Technology of China and the University of Information Technology, Mechanics, and Optics are notable for their steady growth, publishing 8 and 7 articles, respectively, by 2025. This trend highlights the active involvement of Asian and European research centers in the development and application of emerging technologies. At the same time, the synchronized growth of contributions from these institutions indicates a possibility of indirect collaboration or thematic alignment, which reflects a common interest in digital technologies and their impact on the global economy.
The institutional analysis from the Scopus database identifies the leading organizations contributing to research on artificial intelligence and machine learning in banking, showcasing their influence and collaboration networks within the field.
One of the notable trends is the clear rise of Amity University (
Figure 12), which, compared to 2019, has demonstrated a steady and rapid increase in scientific production. This institution has come to dominate the academic landscape of the analyzed field, reaching a maximum of six articles published in 2024.
On the other hand, Ahlia University stands out with a slower but steady growth, which suggests a stable and relevant presence in the academic landscape. This evolution is also supported by other institutions such as Guglielmo Marconi University and the University School of Business, which have continued to contribute with a moderate number of publications. These increases highlight a diversified expansion in which several institutions find an important role in the development of the analyzed field.
At the same time, the analysis reveals significant differences in the pace of contribution between institutions. Harvard University, University and University School of Business Presentations intermittent, albeit modest, contributions, which may reflect either a focus on specific research niches or the prioritization of other academic fields. This temporal variability adds a level of complexity to the academic landscape, suggesting that methodological and thematic diversity is a characteristic element of this analysis.
Also of interest is the diversified international affiliation. Institutions such as the University of Delhi and the Universidad Francisco de Vitoria demonstrate an increasing commitment to research in the field studied, indicating increasing internationalization and cross-border collaboration in the production of scientific works. This diversity highlights the global impact of the research topic and its ability to attract the interest of a wide range of institutions.
An important peculiarity is the significant increase in contributions in the period 2020–2024, which coincides with the global pandemic context. These recent dynamics suggest an intensification of interest in the research topics analyzed, probably driven by the transformations and challenges brought to this activity. Institutions have found opportunities for academic exploration within new paradigms, and this has generated a visible increase in scientific production. Thus, it analyses a complex picture of the evolution and distribution of academic contributions, highlighting both divergences and convergences in the global academic landscape.
4.4. Country-Level Research Analysis and Collaboration
This section addresses RQ5 by analyzing the geographical distribution of research contributions and identifying the most active countries in advancing the field of artificial intelligence and machine learning within the banking sector, as revealed by the bibliometric data.
Figure 13, the presentation of the world map of collaboration countries from the Web of Science database. Thus, it provides a comprehensive representation of international networks of research collaboration, highlighting the connections between different countries and underlining the dynamics of global academic partnerships. This visualization suggests the existence of well-established networks, reflecting knowledge exchanges and collaborations between institutions in different regions of the world.
First, global collaboration centers are dominated by countries such as the United States, China, and Western European countries. They act as central nodes in the global network, highlighting their dominant position in scientific production and transnational research initiatives. Also, the Asia-Pacific regions, including Australia, play a key role by connecting institutions in the southern hemisphere with those in other parts of the world, underlining the integrative nature of scientific research.
On the other hand, the dynamics of connections show that the strongest collaborative relationships are represented by thick links between the United States and countries in Europe and Asia. These well-established partnerships reflect the frequent exchange of knowledge and resources between the most advanced research centers. At the same time, there are signs of emerging collaborations between Central and Eastern European countries, such as Romania, and research centers in Asia and North America. These new connections highlight the diversification of academic networks and the increasing contribution of developing countries to global research efforts.
Regional and global interactions also highlight an intensification of collaborations between European countries, especially within the European Union. This reflects regional integration initiatives, which facilitate the development of joint research and the exchange of good practices. Similarly, China has an increasingly visible presence in the international collaboration landscape, with extensive links to Europe, North America and Australia. This underlines its rise as a key player in global research, but also its involvement in addressing global challenges.
The global dimension of research, as highlighted by the map, shows that it is a joint effort of the international community. Frequent transcontinental interactions highlight the fact that global issues such as digitalization, climate change or sustainability require cooperation between countries. These networks facilitate the exchange of knowledge and technologies, contributing to the development of integrated solutions to contemporary challenges. In the future, strengthening international partnerships and extending them to less represented regions such as South America and Africa could further diversify the landscape of scientific collaboration. The involvement of these new actors in global networks could bring new and innovative perspectives, contributing to the development of more equitable and sustainable solutions. Thus, the map reflects the importance of continued international cooperation, which supports scientific progress and responds to global needs.
Figure 14 illustrates the distribution of publications by country of corresponding authors in the Web of Science database, specifically analyzing collaboration at national (SCP—Single Country Publication) and international (MCP—Multiple Country Publication) levels. The distribution highlights both the general trends of global scientific production and the collaboration strategies preferred by different countries.
First of all, countries such as China, the USA and India stand out for their very high number of publications, which indicates a high level of involvement in scientific research. This strong presence demonstrates the capacity of these states to produce new knowledge and to contribute significantly to their fields of interest. Moreover, publications from these countries are mainly of the SCP (Single Country Publication) type, reflecting a majority focus on domestic research. This approach can be attributed to the significant resources available at national level, which allow for autonomous research.
Secondly, the distribution of MCP (Multiple Country Publication) publications, which highlight international collaborations, is more visible in the case of European countries such as France, Germany and Switzerland. This underlines a strong orientation towards global cooperation networks and active involvement in transnational projects. Europe clearly tends to promote international collaborations more than other regions, a phenomenon that can be explained by the common research policies promoted by the European Union and by the open working cultures in academic fields.
On the other hand, countries such as Romania, Morocco, Vietnam or Egypt contribute with a relatively low number of publications, which can be attributed to more limited resources or research priorities that focus on local needs. In these countries, publications are predominantly SCP type, suggesting a lower participation in global initiatives, but an internal development adapted to the specific problems of each country.
A notable aspect of the analysis is the leading role of the USA and China, not only in volume, but also in involvement in international collaborations. Although SCP is dominant for both nations, they present a significant number of MCP publications, reflecting their influence in global research networks. This balance between domestic research and external collaborations highlights diverse strategies to capitalize on resources and strengthen their position on the global scientific scene.
In terms of geographical distribution, it is evident that Asian regions, such as China, India or Vietnam, favor national research, which reflects either a need to develop their own solutions or a reduced dependence on international collaborations. In contrast, European countries present a different profile, focused more on cross-border collaborations, a phenomenon that indicates a high degree of integration into the global scientific community.
The geographical distribution of international collaborations in the Scopus database also shows networks of academic partnerships between different countries. The intensity of collaboration is represented by the color blue, where darker shades indicate a more significant contribution to the analyzed academic literature, and the lines drawn between countries suggest direct links, indicating the flows of collaboration. Among the main observations, the significant contributions from India stand out, which (
Figure 15), due to the intensity of the dark blue, suggests a central role of this country in the production of scientific literature related to the analyzed topics.
The map also highlights active global collaboration networks, especially between India and countries in North America, such as the United States, in Europe, including the United Kingdom and other European states, as well as in Southeast Asia, which emphasizes the transcontinental involvement in research and the globalization of the studied topics. In parallel, countries marked in lighter shades of blue, such as those in Africa, Oceania and South America, reflect a lower level of collaboration, but their inclusion in the map highlights global participation, even at a lower intensity. Emerging trends suggest a concentration of collaborations around major hubs, such as India, the United States and Europe, indicating that these regions are the main drivers of academic research in the selected field.
Figure 16 provides the distribution of published papers, classified by country of corresponding authors in the Scopus database. Thus, India stands out significantly for the high number of papers, around 15, most of which are produced in domestic collaboration (SCP), indicating a high capacity for national academic production. Similarly, China and the United Kingdom stand out for a balanced mix of MCP and SCP, highlighting both active international collaboration and their own research capacity. The United States and Australia also contribute a significant number of papers, most of them in the MCP framework, reflecting an active involvement in international collaboration networks. On the other hand, countries such as Bahrain, Spain and Turkey, with a lower number of publications, present a balance between SCP and MCP, indicating a diversity in collaboration styles. In terms of small but diverse contributions, countries such as Greece, Brazil, Egypt, Italy and Hungary have a more modest presence but are actively participating in international and national collaborations, suggesting a potential for development in the field under review. Overall, the general trends show a preponderance of SCP in many countries, suggesting a local focus of research; however, countries that combine SCP and MCP, such as China and the United Kingdom, highlight the importance of international collaborations for advancing the field.
4.5. Analysis of Specialized Journals
This section addresses RQ6 by examining the academic journals that serve as key platforms for disseminating the most highly cited research on artificial intelligence and machine learning in the banking sector, highlighting their role in shaping the field’s scholarly discourse.
Figure 17 highlights the distribution of the most frequently cited sources within a corpus of scientific literature from the Web of Science database, revealing the importance of these sources in the local context of the study. The horizontal axis represents the number of citations, which indicates the impact and relevance of each source analyzed. The analysis provides significant insights into the preferences of the academic community and the fundamental sources used in the field of research.
First, ARXIV and Expert Systems with Applications stand out as the sources with the greatest local influence. With an impressive number of 395 citations, ARXIV ranks first, demonstrating its popularity as an open platform for articles and preprints. Its free accessibility and interdisciplinary approach make ARXIV a preferred source for the latest research in data science, artificial intelligence and related fields. In second place, Expert Systems with Applications, with 354 citations, confirms its influence in applied studies related to expert systems and industrial uses of artificial intelligence.
In addition, other significant sources in the technical and operational community are noteworthy. Proceedings of CVPR IEEE, with 183 citations, reflect considerable interest in advanced research in image processing, highlighting its relevance in areas such as visual recognition and machine learning. Similarly, Lecture Notes in Computer Science and the European Journal of Operational Research, each with 132 citations, indicate high relevance for computational and analytical research. While the former supports fundamental studies in computer science, the latter emphasizes the application of AI technologies to decision-making and operations optimization.
Mid-level sources also play an important role within the scientific community. Advanced Neural Information Processing Systems (Adv Neur In) and IEEE Access, each with 124 citations, are widely used for technical and innovative approaches in deep learning, but have a smaller impact compared to the top sources. This suggests that, although their influence is consistent, they are more niche and addressed to a specialized audience.
Regarding niche sources, the Journal of Banking and Finance (95 citations) and Decision Support Systems (92 citations) highlight their importance for the financial-banking fields and the economic applications of artificial intelligence. These publications play a key role in exploring the use of AI technologies for decision support and financial analysis. The list is completed by the Journal of Business Research, with 80 citations, which highlights its contribution to studying the impact of artificial intelligence on the economy and organizational behavior.
4.6. Factor Analysis Approach Through Principal Component Analysis (PCA)
The results of the factor analysis, complemented by the bibliometric insights, underline the significant role of Artificial Intelligence (AI) and Machine Learning (ML) in banking research. These analyses reveal a comprehensive structure of research themes, interconnections, and collaborations within the domain, providing a detailed understanding of the field’s dynamics.
The keyword analysis confirms the dominance of “machine learning” and “artificial intelligence”, which hold the highest occurrences and total link strength. These terms underscore their pivotal role in modern banking research, particularly in areas such as risk management, process optimization, and customer interaction. The bibliometric findings corroborate this, with both databases (Web of Science and Scopus) emphasizing the centrality of these terms. However, the bibliometric analysis additionally highlights differences in thematic focus, where Web of Science leans towards emerging technologies like blockchain, while Scopus prioritizes applications in customer interaction and natural language processing.
The keyword clusters reveal a diverse thematic landscape. While Cluster 1 consolidates the foundational methodologies of AI and ML, other clusters, such as those centered around “banking”, “classification”, and “credit risk”, illustrate applied research areas. Clusters such as big data and deep learning emphasize advanced computational techniques, while prediction and performance suggest a focus on practical outcomes in financial forecasting and operational efficiency. These findings align with bibliometric insights, where institutional collaborations and international research reflect an adaptive response to global challenges, such as the COVID-19 pandemic.
The Principal Component Analysis (PCA) results enhance the keyword analysis by identifying the variance explained by the main components. The scree plot demonstrates that the first component (PC1) captures approximately 78% of the variance, underscoring its significance in summarizing the dataset’s primary trends. The orthonormal loadings suggest that variables like C1 (machine learning) and C2 (artificial intelligence) strongly influence PC1, while C3 (more niche topics) contributes less significantly. These results align with the bibliometric conclusion that AI and ML are central themes, with emerging areas like blockchain or natural language processing contributing as secondary components.
Therefore, the PCA was conducted on a reduced keyword matrix derived from co-occurrence analysis using VOSviewerversion 1.6.18. The initial list contained 54 keywords, which were then filtered to include only those with a minimum frequency ≥ 10 and total link strength ≥ 50, leading to the selection of 16 variables. These thresholds were determined based on best practices in co-word analysis to ensure thematic relevance and reduce noise [
78].
The PCA was performed in EViews 12, using the correlation matrix approach with unrotated factor solutions. Prior to factor extraction, Kaiser-Meyer-Olkin (KMO) [
79] and Bartlett’s Test of Sphericity [
80] were applied to confirm sampling adequacy and factorability. The KMO measure was 0.631—deemed “mediocre” yet acceptable per Kaiser’s criteria—while Bartlett’s test was significant (
p < 0.001), justifying the use of PCA.
Three components were extracted, but interpretation was restricted to those with eigenvalues > 1.0, following the Kaiser criterion, and validated using scree plot inflection points. The first component accounted for 77.98% of the variance and reflects a dense cluster of foundational terms such as “machine learning,” “artificial intelligence,” and “classification.” The second component explained 21.85%, dominated by emergent, high-impact topics like “blockchain,” “deep learning,” and “prediction.” The third component contributed a negligible 0.18% and was not retained in interpretation.
The factor loadings revealed clear thematic separations, allowing for the identification of core versus peripheral research fields. This process advances beyond simple mapping and enables nuanced detection of latent structures within the bibliometric space [
76].
The bibliometric analysis highlights variations in collaboration networks. While Web of Science reflects a concentration of research efforts around a few key authors, Scopus suggests a more diverse and collective approach. This distinction is important, as it reveals how different publication ecosystems foster distinct research dynamics. Moreover, the factor analysis confirms the prominence of international collaborations, particularly between the USA, China, and Europe, reinforcing the global relevance of banking research driven by AI and ML. The emphasis on keywords like “credit risk”, “bankruptcy prediction”, and “support vector machines” reflects the increasing application of AI and ML in addressing financial risks and enhancing predictive analytics. These findings complement the bibliometric insights on the growing relevance of customer interaction tools and optimization techniques, suggesting a convergence of theoretical advancements and practical implementations.
The bibliometric analysis also emphasizes the growing role of European and Asian contributions in banking research, possibly driven by increased digital transformation and regulatory innovation in these regions. This trend is mirrored in the clustering of applied topics like “fintech” and “finance”, pointing towards region-specific research priorities. Therefore, the integration of factor analysis and bibliometric data provides a comprehensive perspective on the evolving landscape of AI and ML in banking. While the centrality of AI and ML is universally acknowledged, regional and thematic variations suggest a dynamic and multifaceted field. Future research should further explore niche areas, such as blockchain and customer interaction technologies, while fostering balanced international collaborations to maximize the impact of these advancements in banking. This combined approach ensures that theoretical innovation is matched by practical applicability, driving the sustainable transformation of financial services.
In order to prepare the dataset, we have selected the variables C1—occurrences, C2—total link strength and C3—clusters, each corresponding to the most influential keywords. (
Table 1). The selection of variables in this table was guided by their ability to reflect the thematic and methodological structure of research in Artificial Intelligence (AI) and Machine Learning (ML) within the banking sector. Each variable represents a keyword that serves as a proxy for a specific research focus or methodological approach, allowing for a comprehensive analysis of the domain. The inclusion of these keywords is based on their frequency of occurrence, their connectivity within the research network, and their clustering patterns, all of which provide critical insights into the centrality, relationships, and substructures within the field.
Occurrences (C1) measure the prominence of each keyword in the dataset, with higher values indicating themes that are extensively studied or foundational, such as “machine learning” and “artificial intelligence.” These terms represent core concepts driving innovation and research in banking. In contrast, keywords with lower occurrences, such as “support vector machines” and “bankruptcy prediction,” reflect niche or specialized areas of application that contribute to specific challenges or solutions within the sector.
The total link strength (C2) quantifies the degree of co-occurrence between keywords, serving as an indicator of their interconnectedness within the research network. High link strength values suggest that these terms frequently appear together in studies, signifying their role as integrative or complementary elements in the research landscape. For instance, “big data” and “deep learning” show moderate link strengths, highlighting their importance as enablers of advanced analytics in banking.
Clusters (C3) further refine the selection by categorizing keywords into thematic groups based on their co-occurrence patterns. These clusters reveal the subfields or specialized areas within the broader domain, such as financial applications of AI (e.g., “credit risk” and “bankruptcy prediction”) or methodological advancements (e.g., “classification” and “support vector machines”). Clustering also demonstrates the interdisciplinary nature of AI and ML research, where techniques and applications converge to address complex problems in banking.
The rigor of this selection process lies in its alignment with bibliometric methodologies, which prioritize variables that balance high relevance with diverse representation. By including keywords that span foundational concepts, methodological innovations, and applied topics, the analysis ensures a multidimensional understanding of the research field. This approach allows for a detailed exploration of both dominant trends and emerging areas, highlighting the interconnected and dynamic nature of AI and ML research in banking.
Furthermore, the decision to employ Principal Component Analysis (PCA) in the analysis of the selected variables is rooted in its methodological rigor and its ability to distill high-dimensional data into a reduced set of uncorrelated components. PCA is particularly well-suited for analyzing bibliometric data because it identifies underlying patterns and relationships within a complex dataset while preserving the variance structure. This statistical technique aligns with the objectives of the study, which seeks to uncover dominant research themes and their interconnections in the field of Artificial Intelligence (AI) and Machine Learning (ML) in banking.
One of the primary reasons for using PCA is its capacity to address multicollinearity among variables. In bibliometric datasets, keywords often exhibit high correlations due to their frequent co-occurrence in the same studies or thematic areas. For instance, “machine learning” and “artificial intelligence” are strongly correlated as they represent foundational methodologies in banking research. PCA resolves this issue by transforming the original variables into orthogonal components, ensuring that each principal component (PC) represents unique dimensions of the data without redundancy. This transformation enables a clearer interpretation of the relationships among variables.
Additionally, PCA is employed to reduce the complexity of the dataset while retaining most of the original information. Bibliometric data often contain numerous variables, such as occurrences, link strengths, and clusters, which can make direct analysis cumbersome and less insightful. PCA simplifies this complexity by aggregating the variance explained by multiple variables into a few principal components. For example, in this study, the first component (PC1) may capture the variance associated with core themes like “machine learning” and “artificial intelligence,” while subsequent components may represent niche areas such as “credit risk” or “deep learning.” This dimensionality reduction enhances interpretability and allows researchers to focus on the most influential factors driving the research landscape.
Another justification for PCA lies in its ability to quantify the contribution of each variable to the overall dataset structure. The eigenvalues and eigenvectors computed during PCA indicate the proportion of variance explained by each component and the loading of each variable on these components. This quantification provides empirical evidence for the centrality of dominant themes, such as the high eigenvalue loading of “machine learning” on PC1, validating its importance in the field. Furthermore, the visualization tools associated with PCA, such as biplots and scree plots, facilitate the identification of clusters and outliers, offering additional insights into the thematic organization of the data.
From a methodological perspective, PCA aligns seamlessly with the goals of bibliometric and factor-analytic studies. Its application allows for a systematic and unbiased examination of research trends, ensuring that the analysis captures both the breadth and depth of the field. By uncovering latent structures and reducing dimensionality, PCA not only enhances the rigor of the study but also provides a robust foundation for interpreting the complex interplay of themes, methodologies, and applications within AI and ML research in banking. This approach ensures that the findings are both statistically sound and practically relevant, offering valuable insights into future research and policy development.
Therefore, the results of the PCA are outlined in
Table 2 and
Figure 18,
Figure 19 and
Figure 20. The scree plot (
Figure 18) presents the eigenvalues of the components extracted through Principal Component Analysis (PCA). The first eigenvalue is 2.339, explaining 77.98% of the total variance, while the second eigenvalue is 0.655 (21.85% of the variance), and the third is minimal at 0.005 (0.18% of the variance). This steep drop in eigenvalues after the first component demonstrates that the first principal component (PC1) captures most of the variability in the data. The cumulative proportion reaching 99.82% after the second component suggests that the dataset can effectively be reduced to two principal components without significant loss of information.
The orthonormal loadings plot (
Figure 19A) shows how variables C1, C2, and C3 are distributed across the first two components. C1 (representing “machine learning”) and C2 (representing “artificial intelligence”) have high positive loadings on PC1, indicating their strong influence on the primary dimension of variance. C3 (representing niche or secondary keywords) has a strong loading on PC2 and negative contributions to PC1. This plot highlights the centrality of AI and ML (captured by C1 and C2) as driving forces in the research dataset. The secondary influence of C3 underscores its role in capturing emerging or specialized areas (e.g., “deep learning” or “credit risk”).
The scores plot (
Figure 19B) maps individual data points (observations) onto the two primary components, PC1 and PC2. Most observations cluster around the origin, indicating a concentration of research efforts on common or overlapping themes related to AI and ML in banking. However, a few outliers (e.g., points labeled 1 and 2) are situated far from the cluster, representing unique or niche studies within the dataset. This clustering reinforces the notion that AI and ML dominate the research landscape, while the outliers reflect specialized studies focusing on unique methodologies or applications.
The biplot (
Figure 19C) combines the loadings and scores, visualizing the relationship between variables (C1, C2, C3) and observations. C1 and C2 point strongly towards PC1, reflecting their dominance in the dataset. C3, pointing along PC2, aligns with studies focusing on specialized themes, as indicated by the outlier data points. The interpretation is that the research field is heavily influenced by core themes like AI and ML, while secondary areas (e.g., “classification” or “support vector machines”) complement these dominant themes.
The variability plot (fluctuation of C1, C2, C3) (
Figure 20) over observations reveals contrasting trends. C1 and C2 display significant variability, reflecting their dominant but diverse application across studies. C3, by contrast, remains relatively stable, indicating its limited but consistent role in capturing niche aspects of the dataset.
This stability supports the conclusion that C3 (e.g., topics like “credit risk” or “bankruptcy prediction”) represents a less variable but specialized focus within the broader field of AI and ML.
Thus, the PCA results confirm that PC1 (with an eigenvalue of 2.339) captures the largest portion of variability (77.98%), followed by PC2 (21.85%), and PC3 with negligible contribution. This aligns with the scree plot, indicating that most research themes can be reduced to two dimensions without significant loss of information. The eigenvector loadings show that C1 and C2 are positively correlated with PC1, while C3 aligns more with PC2. This division indicates that the primary component captures foundational themes (AI and ML), while the secondary component represents niche or emerging areas.
Factor analysis conducted through Principal Component Analysis (PCA) [
70] serves as an important complement to bibliometric analysis performed in Bibliometrix, providing a more rigorous and multidimensional understanding of the research landscape. While bibliometric analysis offers a descriptive and relational overview by identifying trends, thematic clusters, and co-occurrence networks, PCA enhances these insights by introducing a quantitative framework to uncover latent structures, quantify contributions, and address the complexity of the dataset. The integration of these methodologies allows for a deeper exploration of both dominant themes and emerging areas in the field of Artificial Intelligence (AI) and Machine Learning (ML) in banking.
Bibliometric analysis lays the groundwork by organizing data into networks that reveal the frequency and interconnections of keywords, clusters, and research collaborations. However, its descriptive nature limits its ability to quantify the relationships between variables or to prioritize themes based on their explanatory power. This is where PCA becomes indispensable, as it identifies the principal dimensions of variance within the dataset, effectively reducing its complexity while retaining its most significant patterns. By transforming correlated variables into orthogonal components, PCA eliminates redundancy and ensures that each principal component represents a unique aspect of the research landscape.
Moreover, PCA not only validates but also refines the findings of bibliometric analysis. For instance, bibliometric tools might identify “machine learning” and “artificial intelligence” as central themes based on their high frequency and strong co-occurrence. PCA corroborates this centrality by demonstrating that these variables dominate the variance explained by the first principal component, providing empirical evidence for their foundational role. Furthermore, PCA’s capacity to quantify the variance associated with secondary components highlights emerging but less prominent themes, such as “deep learning” or “credit risk,” offering a nuanced understanding of niche areas that might otherwise be overlooked.
The integration of PCA and bibliometric analysis also facilitates a more systematic visualization of the research structure. While bibliometric clustering maps thematic groupings, PCA enhances this by organizing variables and observations into a hierarchy of components that reflects their statistical significance. This synergy not only clarifies the thematic organization but also enables the identification of outliers or unique contributions, providing actionable insights for future research directions.
Thus, factor analysis through PCA [
72] enriches bibliometric analysis by adding statistical rigor and revealing hidden dimensions within the data. This complementary relationship ensures a comprehensive perspective that balances descriptive and quantitative insights, making it possible to capture both overarching trends and subtle thematic variations. Such an integrated approach is particularly valuable in the dynamic and complex field of AI and ML in banking, where a multidimensional understanding of research themes and their interconnections is essential for advancing knowledge and informing evidence-based policy decisions.
When integrated with the bibliometric analysis, these results paint a comprehensive picture of the research field:
The dominance of AI and ML as central themes is evident both in PCA and bibliometric analyses, supported by high occurrences and strong linkages in keyword networks.
The PCA’s secondary components align with bibliometric findings that highlight emerging areas like customer interaction tools in Scopus or blockchain in Web of Science.
The internationalization and collaborative nature of research, emphasized in bibliometric analysis, are indirectly reflected in the clustering of data points, showing overlaps and shared themes across studies.
The factor analysis complements the bibliometric findings by quantitatively confirming the centrality of AI and ML while identifying secondary dimensions that represent emerging or specialized research areas. The visualizations and statistical results collectively demonstrate a research field characterized by dominant methodologies and applications, alongside a growing exploration of niche topics. Future research can leverage these insights to explore underrepresented areas or enhance interdisciplinary collaboration.
The integration of bibliometric analysis and factor analysis offers a comprehensive and rigorous exploration of research trends in Artificial Intelligence (AI) and Machine Learning (ML) within the banking sector. By combining these two methodologies, this study bridges quantitative insights from statistical modeling with qualitative and networked understandings from bibliometric tools. This hybrid approach not only confirms the centrality of core themes but also uncovers nuanced patterns that would remain obscured when relying on a single method.
5. Discussion
5.1. Interpretation of Results from a Sustainable Banking Perspective
The keyword network analysis confirms the central role of artificial intelligence and machine learning concepts in banking literature, which are mainly associated with risk management, performance analysis, and predictive models. This thematic structure indicates that current research uses AI and ML primarily as quantitative tools for optimizing banking processes and managing traditional financial risks. The predominant focus on operational efficiency and predictive capability suggests a consolidation of these technologies in the core functions of banking.
From the perspective of ESG and sustainable banking, however, the results show that these dimensions are poorly integrated into the core of AI and ML-based research. Although AI and ML provide an appropriate analytical framework for assessing climate risks, analyzing the social impact of banking, and improving decision-making governance, these directions appear marginal in the structure of thematic networks. Concepts associated with sustainability and green transition are poorly connected to the dominant clusters, suggesting that sustainable banking is not yet systematically addressed by advanced artificial intelligence tools.
The differences between the Web of Science and Scopus databases reinforce this interpretation. In Web of Science, the strong orientation towards technological innovation and advanced machine learning methods indicates a focus on the development of analytical tools, with a low recurrence of topics related to banking sustainability. In Scopus, although the thematic structure is more diverse and includes applications oriented towards digital services and financial behavior, the directions associated with sustainable banking do not form a distinct and coherent cluster. In conclusion, the results suggest that ESG and sustainable banking are still peripheral in the literature on AI and ML in banking, being insufficiently correlated with the risk and performance models that dominate current research.
The analysis of author and co-citation networks highlights structural differences between the Web of Science and Scopus databases, reflecting distinct levels of intellectual maturity and thematic orientation of AI and ML research in the banking sector. From a sustainable banking perspective, these differences are relevant because they indicate how scientific knowledge is organized around long-term risks, systemic stability, and the structural transformation of financial intermediation.
In Web of Science, the existence of consolidated intellectual hubs, dominated by authors with consistent output and high visibility in the co-citation network, suggests methodologically stable literature. This stability is associated with the development and refinement of quantitative models used in banking risk management. Such a structure favors analytical consistency and research continuity but may lead to slower integration of emerging dimensions such as climate risks, transition risks, and the assessment of long-term impacts on banking portfolios.
In contrast, the network of authors and co-citations in Scopus is more dispersed and less hierarchical, reflecting a greater diversity of theoretical and methodological perspectives. This structure indicates greater openness to new and interdisciplinary topics, but also a lack of conceptual convergence in the integrated approach to sustainability through AI and ML. The absence of dominant authors or theoretical frameworks suggests that sustainability-related dimensions are still being explored in a fragmented manner, without being consolidated into a unified analytical framework.
A comparative analysis of co-citation networks indicates that, in both databases, the literature continues to be dominated by paradigms oriented towards traditional financial risk and operational efficiency. From a sustainable banking perspective, this suggests that AI and ML are predominantly used to optimize short-term decisions, while their potential for analyzing structural, climate, and sustainability risks is insufficiently integrated into the theoretical core of the field.
An analysis of institutional collaboration networks in Web of Science and Scopus highlights relevant differences in the structure and evolution of research on artificial intelligence and machine learning in the banking sector. From the perspective of the long-term sustainability and resilience of the banking system, these networks reflect how scientific expertise is concentrated and disseminated at the institutional level.
In Web of Science, the steady increase in academic output after 2017 indicates a consolidation of research focused on the development of advanced analytical models. This dynamic suggests the existence of stable research centers that contribute to the methodological continuity necessary for assessing structural and systemic risks. The geographical diversification of the institutions involved broadens the basis for analysis to different banking systems, increasing the literature’s ability to capture heterogeneous vulnerabilities.
In Scopus, the structure of collaborations is more concentrated, with dominant institutions and rapid increases in output. This configuration reflects a more applied orientation and a focus on specific directions, which may accelerate the exploration of emerging themes but limits the development of integrated analytical frameworks on the long-term stability of the banking sector.
Overall, the analysis shows that institutional networks are expanding and becoming increasingly internationalized, but directions related to sustainability, resilience, and structural transformation of banking activity are not yet consolidated around coherent institutional collaborations. The differences between the two databases reflect distinct models of research organization, with direct implications for how AI and ML can support the long-term stability of the banking system.
The analysis of country-level collaborations highlights structural differences between the Web of Science and Scopus databases in terms of the organization and internationalization of research on artificial intelligence and machine learning applied to the banking sector. From the perspective of the long-term stability and resilience of the banking system, these networks indicate how research capacity and knowledge exchange are concentrated globally.
In Web of Science, the United States, China, and Western European countries act as central nodes of international collaborations, facilitating intense flows of knowledge between advanced research centers. This structure suggests a concentration of expertise in mature economies, which influences the dominant directions of research and the development of analytical models used in banking risk assessment. The presence of emerging connections with countries in Central and Eastern Europe indicates a gradual expansion of networks, but with a still secondary role in global scientific output.
In Scopus, collaborations are more evenly distributed between national and international research, with countries such as India, the United States, China, and the United Kingdom playing a central role. This structure reflects a diversification of contributions and a broader integration of research from emerging economies, without however changing the overall hierarchy of scientific influence.
In both databases, the intensification of research activity in 2020–2024 indicates an acceleration of interest in digitalization and structural transformations in the banking sector. The results show that, although international collaborations are extensive, research remains concentrated around a limited number of advanced economies, with direct implications for the dominant directions of AI and ML development in banking.
5.2. Factor Analysis: Novel Contributions and Implications for Academia and Practitioners
Factor analysis, as applied in this study through Principal Component Analysis (PCA), provides a structured method for reducing multidimensional data into its principal components. By extracting and analyzing eigenvalues, eigenvectors, and correlations, factor analysis identifies latent patterns and hierarchical structures that characterize the research field. The strength of this approach lies in its ability to quantify variance and identify dominant dimensions, such as the foundational importance of AI and ML, while simultaneously isolating niche contributions, such as “credit risk” or “deep learning”.
The methodological rigor of factor analysis is evident in the systematic decomposition of variance. The scree plot revealed that the first principal component (PC1) accounts for nearly 78% of the total variance, emphasizing the dominance of key themes. Furthermore, the orthonormal loadings and scores clearly delineate the relationships between keywords and their underlying dimensions. These results provide an empirical foundation for interpreting the prominence of thematic clusters while ensuring objectivity and replicability. Unlike bibliometric tools, which rely on co-occurrence networks, factor analysis adds a robust statistical layer, quantifying the relative influence of variables across dimensions.
While bibliometric analysis maps thematic networks and author collaborations, it lacks the quantitative precision necessary to evaluate hierarchical relationships or measure variance within the data. However, when integrated with factor analysis, as in this study, it enables a more detailed exploration of structural dynamics. For instance, bibliometric results underscore the centrality of “artificial intelligence” and “machine learning,” corroborating factor-analytic findings that these themes dominate PC1. Moreover, the bibliometric insight into thematic shifts between databases—where Web of Science emphasizes blockchain and Scopus highlights customer interactions—aligns with the orthonormal scores, which show how PC2 captures these secondary but emerging areas.
This complementarity is particularly evident in the identification of niche clusters. Bibliometric clustering revealed distinct themes such as “fintech” and “deep learning,” which factor analysis further validated by showing their limited but focused contributions to PC2. Together, these methodologies capture both the macro trends (core themes like AI and ML) and micro trends (specific applications like “support vector machines” or “bankruptcy prediction”), offering a holistic perspective on the research landscape.
The results provide a layered understanding of the field. The dominance of PC1, as shown in the scree plot and eigenvalues, reflects a strong concentration of research on AI and ML methodologies. These technologies, represented by variables C1 and C2, are not only the most frequently occurring but also the most interconnected, as evidenced by their high eigenvector loadings and correlation coefficients. This centrality underscores their foundational role in advancing predictive analytics, fraud detection, and process optimization within banking.
However, the second principal component (PC2) highlights the diversity within the field, capturing emerging areas that extend beyond traditional AI and ML applications. For example, topics such as “deep learning” and “credit risk,” while secondary, represent focused streams of research that address specific challenges in banking. The stability of C3, as seen in the variability plot, reinforces its role in anchoring these niche areas, providing consistent but less dominant contributions.
The orthonormal loadings and scores further reveal how individual studies align with these components. The clustering of observations near the origin indicates a convergence of research around common themes, such as risk management and customer interactions. Conversely, the presence of outliers in the scores plot reflects innovative or specialized studies, such as those exploring blockchain or non-traditional credit evaluation methods. These findings align with bibliometric insights into international collaboration, where regions like Asia and Europe have driven niche innovations, influenced in part by the COVID-19 pandemic.
Overall, the dominant principal component (PC1) highlights well-established themes such as predictive analytics and AI/ML integration in operational banking. However, the second component (PC2), representing niche topics like “credit risk” and “bankruptcy prediction,” reveals gaps in integrating ESG-specific variables and climate-related stress testing. Additionally, regional clustering showed that European and Asian institutions focus on fintech and blockchain, but less on sustainability-linked outcomes. These findings suggest underexplored opportunities for applying ML in climate risk modeling and green finance, areas currently underserved in the literature.
The integration of bibliometric and factor-analytic approaches introduces several novel contributions to the understanding of AI and ML research in banking. First, the use of factor analysis quantifies the dominance and interrelations of variables, providing empirical validation for bibliometric observations. For example, while bibliometric clustering identifies “machine learning” as a central node, factor analysis assigns it the highest eigenvalue loading, confirming its statistical significance.
Second, the combination of methodologies reveals hidden patterns that would remain inaccessible in isolation. Bibliometric tools, while effective in mapping collaborations and thematic networks, cannot quantify the variance explained by different themes. Conversely, factor analysis, while robust in dimensional reduction, lacks the networked perspective offered by bibliometric tools. Together, they offer a multidimensional view, balancing quantitative precision with qualitative richness.
Finally, this integrated approach enhances the granularity of insights. For instance, bibliometric findings show a shift towards customer-centric applications in Scopus, while factor analysis, through PC2, quantifies the variance associated with these emerging trends. This granularity not only validates the findings but also offers actionable insights for future research and policy.
The combined use of factor analysis and bibliometric methodologies has significant implications for both academic research and practical applications in banking. Academically, it establishes a robust framework for future studies, enabling researchers to systematically explore thematic hierarchies and interrelations. Practically, it highlights the critical areas for innovation, such as the integration of AI with fintech solutions or the use of ML for non-traditional credit assessments. Furthermore, the international collaboration trends revealed by bibliometric analysis suggest opportunities for cross-border partnerships, particularly in addressing global challenges like financial inclusion and cybersecurity.
By integrating bibliometric and factor-analytic methodologies, this study provides a comprehensive and nuanced understanding of the role of AI and ML in banking research. The complementarity of these approaches not only validates core themes but also uncovers emerging trends and niche areas, offering a multidimensional perspective on the research landscape. This hybrid methodology represents a significant advancement in the analysis of scientific literature, demonstrating its potential to inform both academic inquiry and practical innovation in the rapidly evolving field of banking technology. Future studies should continue to leverage these methodologies, exploring their application across other domains and expanding the scope of interdisciplinary collaborations.
5.3. Directions for Future Research
The results of this study show that artificial intelligence and machine learning are currently used primarily to assess traditional financial risks and improve banking performance. For sustainability-oriented banks, a clear direction for research is to analyze how these technologies can be used to identify customers and projects that pose long-term risks, such as exposure to climate change or major economic transformations (
Table 3).
Another important direction is to study how banks can combine automated decisions with sustainable development goals. Future research could examine whether AI models can support lending decisions that focus not only on immediate profit, but also on the long-term stability of bank portfolios and their impact on the environment and society.
Greater attention also needs to be paid to differences between banks and countries. Future studies could compare how banks in developed and emerging economies use AI to support more responsible banking. This would help to understand the real limitations related to data, technology, and institutional capacity.
A practical and easily applicable direction is to analyze how bank employees use AI-generated results. Research can examine whether these results are followed automatically or whether they are adjusted by specialists, especially in decisions that have a long-term impact on the stability of the bank.
Finally, future literature should focus more on studies based on real data from banks, not just simulations or theoretical models. Such research can show more clearly the extent to which artificial intelligence actually contributes to building a more stable, responsible, and sustainable banking sector.
6. Conclusions
The findings of this study reflect a comprehensive and detailed picture of the evolution and research directions on the use of artificial intelligence (AI) and machine learning (ML) in the banking sector, based on the analysis of Web of Science and Scopus databases.
With regard to RQ1 (How has the publication of academic articles on the use of artificial intelligence and machine learning in the banking sector evolved according to data from Scopus and Web of Science databases?), the analysis of the evolution of academic publications shows a rapid and diversified growth of interest in the application of AI and ML in banking, especially in the period 2020–2024. This expansion can be correlated with the recent global challenges, such as the COVID-19 pandemic, which have accentuated the need for digitization and optimization of banking processes.
For RQ2 (What are the main emerging research directions on the use of artificial intelligence and machine learning in the banking sector, as identified through keyword network analysis?), the keyword network analysis shows that artificial intelligence and machine learning are the core concepts shaping current research in banking, mainly linked to risk management and process optimization. Beyond these central themes, the emerging research directions differ across databases. In the Web of Science, greater attention is given to technological innovations, particularly the integration of blockchain with AI-based solutions. In contrast, Scopus emphasizes AI applications focused on customer interactions, including natural language processing and data-driven personalization. Overall, the results indicate a dual research trajectory, combining technological innovation with customer-oriented and operational applications of AI in banking.
In terms of author contributions (RQ3: Who are the authors with the highest scientific contributions in the field of artificial intelligence and machine learning applied to the banking sector, according to publications indexed in Scopus and Web of Science?), Web of Science shows a concentration of collaborations around a small number of active authors, such as Zhang Y and Cheng D, indicating a more focused collaboration. On the other hand, Scopus reflects a more diverse network with a more balanced distribution of contributions, suggesting a collective approach to research.
The analysis of institutional collaborations (RQ4: Which research institutions have had the greatest impact on the development of artificial intelligence and machine learning research in the banking sector?) shows a marked internationalization of research, especially in the period 2020–2024, with a significant increase in contributions from Europe and Asia, probably influenced by global challenges such as the COVID-19 pandemic. This is particularly important as it reflects a trend towards globalization of research in banking and emerging technologies, which can support the development of innovative solutions to global economic and financial problems. International collaborations not only improve the quality and diversity of research but also facilitate knowledge sharing between regions affected by global crises, stimulating adaptation and the implementation of effective strategies in the face of common challenges.
In terms of geographical distribution of research (RQ5: How is research on artificial intelligence and machine learning in the banking sector geographically distributed and which countries have the most intense scientific activity in this field?), both Web of Science and Scopus emphasize the central role of the USA, China and Europe. However, Scopus suggests a greater balance between national and international research, while Web of Science shows an already well-established network of transcontinental collaborations.
Finally, regarding the academic journals with the highest impact (RQ6: Which academic journals publish the most influential research on artificial intelligence and machine learning in banking, by number of citations?), the most influential research on artificial intelligence and machine learning in the banking sector is mainly published in ARXIV and Expert Systems with Applications, which have the highest citation rates and serve as central platforms for disseminating impactful results.
From the analysis, we observe that, even if we did not exclude the duplicates from both databases, we obtained different results. This confirms that the study carried out offers distinct and complementary perspectives on the topic under analysis. Although there is some overlap, the differences identified emphasize the importance of using multiple sources for a broader and more accurate understanding. The results show that each database makes a unique contribution, reflecting the diversity and complexity of the information available. This diversity allows us to gain a more comprehensive picture and to identify aspects that would have been missed using a single source.
From a novelty perspective, the study stands out by combining factor analysis through factor-analytic techniques with bibliometric clustering, an integration that has rarely been applied to banking research. Moreover, the use of eigenvalue decomposition to quantify variance explained by core and niche research themes introduces an empirical framework for evaluating thematic centrality. Additionally, the findings extend beyond methodological insights by demonstrating how bibliometric data, such as co-occurrence networks, can be quantitatively analyzed to provide actionable insights into the structure of a rapidly evolving field.
The implications for policymakers are profound, as the study underscores the strategic importance of fostering research in AI and ML to enhance banking efficiency, financial inclusion, and fraud detection. Furthermore, it highlights the need for investments in emerging areas like blockchain and customer interaction technologies, emphasizing their potential to transform banking operations. Policymakers can also leverage the study’s insights into international collaborations, which reveal the pivotal role of global partnerships in driving innovation. Nevertheless, the study also draws attention to ethical considerations, particularly in the equitable application of AI technologies, suggesting the importance of developing regulatory frameworks that balance innovation with fairness and privacy.
Despite its combination of factor analysis and bibliometric analysis providing a structured overview of the literature, this study has several limitations. First, the analysis is based exclusively on the Web of Science and Scopus databases, which may lead to the omission of relevant works indexed in other sources. Second, language restrictions may limit the geographical representativeness of the results.
In addition, bibliometric analysis is based on the co-occurrence of keywords, which captures general thematic structures but does not reflect in detail the full content of the studies analyzed. Finally, the results are not directly correlated with concrete indicators of sustainability or climate performance at the banking level, which limits the assessment of the practical impact of the research. These limitations open up clear directions for future research that integrates more diverse data sources and applied empirical analyses.
In conclusion, the study offers a robust methodological contribution and actionable insights for both academic and policy-making audiences. By addressing its limitations and pursuing the identified future directions, subsequent research can further refine our understanding of AI and ML in banking, ensuring their continued impact on this significant sector.