Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence

Arsalan, Mudassar Hassan; Mubin, Omar; Al Mahmud, Abdullah; Khan, Imran Ahmed; Hassan, Ali Jan

doi:10.3390/metrics2020005

Open AccessArticle

Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence

by

Mudassar Hassan Arsalan

^1,*

,

Omar Mubin

¹,

Abdullah Al Mahmud

²

,

Imran Ahmed Khan

³ and

Ali Jan Hassan

⁴

¹

School of Computer, Data and Mathematical Sciences, Western Sydney University, Rydalmere, NSW 2116, Australia

²

Centre for Design Innovation, Swinburne University of Technology, Hawthorn, VIC 3122, Australia

³

Department of Geography, University of Karachi, Karachi 75270, Pakistan

⁴

Institute of Space Science and Technology, University of Karachi, Karachi 75270, Pakistan

^*

Author to whom correspondence should be addressed.

Metrics 2025, 2(2), 5; https://doi.org/10.3390/metrics2020005

Submission received: 16 February 2025 / Revised: 21 March 2025 / Accepted: 24 March 2025 / Published: 2 April 2025

Download

Browse Figures

Versions Notes

Abstract

In an era of evolving scholarly ecosystems, machine learning (ML) and artificial intelligence (AI) have become pivotal in advancing research impact analysis. Despite their transformative potential, the fragmented body of literature in this domain necessitates consolidation to provide a comprehensive understanding of their applications in multidimensional impact assessment. This study bridges this gap by employing bibliometric methodologies, including co-authorship analysis, citation burst detection, and advanced topic modelling using BERTopic, to analyse a curated corpus of 1608 scholarly articles. Guided by three core research questions, this study investigates how ML and AI enhance research impact evaluation, identifies dominant methodologies, and outlines future research directions. The findings underscore the transformative potential of ML and AI to augment traditional bibliometric indicators by uncovering latent patterns in collaboration networks, institutional influence, and knowledge dissemination. In particular, the scalability and semantic depth of BERTopic in thematic extraction, combined with the visualisation capabilities of tools such as CiteSpace and VOSviewer, provide novel insights into the dynamic interplay of scholarly contributions across dimensions. Theoretically, this research extends the scientometric discourse by integrating advanced computational techniques and reconfiguring established paradigms for assessing research contributions. Practically, it provides actionable insights for researchers, institutions, and policymakers, enabling enhanced strategic decision-making and visibility of impactful research. By proposing a robust, data-driven framework, this study lays the groundwork for holistic and equitable research impact evaluation, addressing its academic, societal, and economic dimensions.

Keywords:

machine learning; artificial intelligence; bibliometric analysis; research impact; topic modelling; scientometrics

1. Introduction

Research impact evaluation has traditionally relied on conventional academic metrics such as citations, the h-index, and journal impact factors to assess scholarly influence within academic communities [1,2]. While these metrics provide valuable insights into academic productivity, they often fail to capture the broader dimensions of research influence, such as societal, economic, and institutional impacts [3]. The emergence of research impact science has sought to address these limitations by developing multidimensional frameworks that encompass diverse factors, including collaboration networks, institutional reputation, knowledge dissemination, and social equity considerations, such as gender representation and authorship biases [4]. However, integrating advanced computational techniques, particularly machine learning (ML) and artificial intelligence (AI), into these frameworks remains an area of growth that has yet to be explored.

This study builds upon a broader exploration of the role of data science in research impact assessment, which examines how data-driven approaches contribute to evaluating and managing research impact. While an earlier study provided a comprehensive literature-based analysis of data science applications in this domain, the current study focuses on the contributions of machine learning (ML) and artificial intelligence (AI). Specifically, it investigated how ML and AI can enhance research impact evaluation by uncovering dynamic insights, such as emerging themes, interdisciplinary connections, and key contributing entities. To provide a structured analysis, this study addressed the following sub-questions:

What is the current status of research impact evaluation using machine learning and artificial intelligence?
What are the prominent methodologies and emerging research themes in applying machine learning and artificial intelligence to research impact assessment?
What are the key directions for future research in integrating advanced computational techniques into multidimensional research impact evaluation?

Advancements in machine learning (ML) and artificial intelligence (AI) offer transformative opportunities to enhance the evaluation of research impact. These technologies excel in processing large datasets, detecting latent patterns, and generating predictive insights that traditional methods struggle to uncover [5]. For instance, machine learning models have been used to forecast citation trends, analyse collaboration networks, and evaluate institutional research performance through big data analytics. Topic modelling algorithms, such as bidirectional encoder representations from transformers (BERT), further demonstrate the ability to identify emerging research areas and their societal implications [6].

This study aimed to consolidate the fragmented literature on research impact evaluation by systematically addressing the sub-questions and providing insights into how ML and AI methodologies shape this evolving field. Advanced bibliometric tools, including VOSviewer for mapping scholarly networks, CiteSpace for detecting citation bursts, and BERT-based topic modelling for uncovering thematic trends, were employed to comprehensively explore these dimensions. This focused investigation extends the previous exploration of data science methodologies, shifting the emphasis from broader data-driven approaches to the specific contributions of machine learning (ML) and artificial intelligence (AI) in research impact assessment.

The contributions are both theoretical and exploratory. Theoretically, they advance scientometric discourse by integrating ML and AI into frameworks that assess research contributions, complementing traditional bibliometric indicators. Initially, they provide a foundational understanding of how computational techniques can redefine the impact evaluation of research, paving the way for future empirical studies. These insights are valuable for academic institutions, researchers, and policymakers as they highlight how ML and AI can enhance the understanding of collaboration patterns, guide funding allocation, and improve institutional strategies through advanced analytics.

The paper is structured as follows. Section 2 discusses the current landscape of research impact evaluation and its intersections with ML and AI methodologies. Section 3 outlines the computational techniques, including bibliometric analysis and topic modelling. Section 4 presents key findings on scholarly networks, thematic trends, and emerging methodologies. Section 5 explores the implications of these findings for research impact in science, followed by Section 6, which summarises key insights and suggests directions for future research.

2. Review of Literature

2.1. Advancing Research Impact Evaluation Through Data Science and Computational Methods

The evaluation of research impact has traditionally relied on citation-based metrics such as citation counts, the h-index, and the journal impact factor (JIF), which primarily measure academic influence within scholarly communities [7,8,9]. While these metrics provide valuable insights into scientific productivity, they often fail to capture broader societal, economic, and interdisciplinary research contributions. Traditional citation-based indicators exhibit disciplinary biases, time lags, and susceptibility to manipulation, including self-citations, which can distort the accuracy of assessment of research impact [10,11,12]. Furthermore, these retrospective indicators struggle to reflect real-time influence, particularly in fast-evolving scientific domains, where immediate engagement and applicability are crucial [13]. In response, alternative metrics (altmetrics) have emerged, offering a broader, more immediate assessment of research impact by incorporating social media engagement, online article views, policy citations, and public discourse [9,11,14]. These alternative indicators enable a more comprehensive evaluation of research influence, which extends beyond academia to policymaking, industry applications, and public engagement.

Integrating data science methodologies has significantly enhanced the analytical capabilities of research impact evaluation. Advances in machine learning (ML), natural language processing (NLP), and big data analytics have enabled the detection of hidden patterns, thematic evolution, and predictive modelling in scholarly communication. Machine learning techniques have been widely employed to improve citation prediction models, allowing researchers to anticipate the potential influence of literary works based on early citation trends, co-authorship networks, and research themes [15,16]. NLP-driven approaches, such as latent Dirichlet allocation (LDA) and BERTopic modelling, provide semantic analysis of large text corpora, allowing for the categorisation of research publications, the detection of emerging interdisciplinary themes, and the identification of research frontiers [17,18]. Additionally, big data analytics has enabled the visualisation of co-citation networks, collaboration structures, and citation bursts, contributing to a more comprehensive and dynamic understanding of research impact [19]. However, despite these advancements, challenges remain in managing large-scale, heterogeneous bibliometric datasets, requiring robust computational infrastructures, standardised metadata, and interoperable bibliographic databases for effective utilisation [20].

The growing application of predictive models and network analysis in research impact assessment has expanded the capabilities of data-driven scientometric evaluations. These models leverage historical bibliometric trends, citation trajectories, and social network structures to forecast the potential influence of research outputs [21]. Predictive analytics helps funding agencies and research institutions in strategic decision-making by identifying high-impact research areas, forecasting knowledge dissemination patterns, and optimising funding allocation [22,23]. Moreover, network analysis enables mapping research collaborations and institutional partnerships, providing insights into key contributors, interdisciplinary research hubs, and emerging centres of excellence. These analytical techniques enhance international scientific collaboration, strengthen institutional research strategies, and optimise research investments.

By integrating advanced computational methodologies with traditional bibliometric indicators, research impact assessment is transitioning from static, citation-based evaluation models toward a more predictive, dynamic, and multidimensional framework. This paradigm shift enables a more nuanced understanding of the academic, societal, and economic implications of research, fostering a more data-driven and strategically informed approach to research evaluation. However, adopting these advanced techniques must be accompanied by careful consideration of data integrity, ethical AI applications, and interdisciplinary collaboration to ensure that research impact assessments remain transparent, equitable, and reflective of the evolving scholarly landscape.

2.2. Bibliometric Analysis Overview

Bibliometric analysis has become a cornerstone for understanding and visualising the development and impact of research across various domains. This methodology offers a systematic approach to assess scholarly output, citation patterns, and thematic trends, thereby providing valuable insights into the evolution of knowledge within specific fields [24].

Citation analysis is a foundational approach within bibliometric methods, quantitatively assessing research impact by analysing how frequently publications are cited. This technique offers profound insights into the influence and relevance of articles within specific disciplines, helping identify key contributions that shape scholarly discourse [25]. Co-citation analysis has proven crucial for understanding the intellectual structure of research fields by revealing the relationships between different areas of study through the citation patterns of document pairs. This method, particularly effective when combined with visualisation tools such as VOSviewer and CiteSpace, facilitates the mapping of co-citation networks and the identification of emerging themes [26].

Citation burst analysis identifies sudden increases in citations, often indicating the emergence of new research topics or significant shifts within a field. This technique, along with identifying keywords with citation bursts, provides powerful tools for tracking scholarly activities’ dynamic nature and rapidly evolving research areas [27].

Co-occurrence analysis and keyword analysis are instrumental in mapping the core topics and trends within specific research domains. These methods help visualise the relationships between key terms and facilitate the identification of dominant research topics and emerging trends, thereby providing a snapshot of the thematic focus in the field [28]. Furthermore, keyword time trend analysis offers a temporal perspective, tracking how research focuses shift over time and highlighting the emergence or decline of specific topics [29].

Network analysis and co-authorship analysis are pivotal for elucidating the relationships between authors, institutions, and research topics. These techniques map the structural framework of research fields and provide insights into the dynamics of scholarly communication, revealing how collaboration shapes research outputs and identifying influential networks within the academic community [30]. Cluster analysis further aids in grouping related items like keywords or citations into distinct clusters, thus offering a comprehensive view of the intellectual landscape of research domains [31].

Country analysis and geographic distribution analysis assess the geographical spread of research output, identifying leading countries or regions in scholarly contributions and illustrating the interconnectedness of the global research community [29]. Similarly, research productivity analysis measures the output of publications over time, providing insights into scholarly productivity across different entities and helping map the influence of academic journals within various fields [25].

Bibliometric network visualization, facilitated by tools such as VOSviewer, CiteSpace, and the Bibliometrix R package, allows for the graphical representation of complex relationships within research data. Among these, VOSviewer, developed by the Centre for Science and Technology Studies (CWTS) at Leiden University, has been instrumental in advancing bibliometric mapping techniques [32]. Unlike conventional visualisation tools, VOSviewer employs the VOS (visualization of similarities) mapping technique, which allows for high-resolution, distance-based visualisation of bibliometric networks while preserving structural integrity and thematic coherence. The software’s scalability in handling large-scale bibliometric datasets and its interactive exploration capabilities (e.g., zooming, density visualisation, and cluster-specific filtering) make it particularly valuable for analysing co-citation networks, keyword co-occurrences, and bibliographic coupling.

These functionalities enable researchers to detect citation bursts, visualise thematic evolution, and conduct multidimensional analysis of scholarly communication trends, contributing to a more nuanced understanding of research impact and knowledge dissemination [33,34].

2.3. Topic Modelling for Thematic Analysis

Topic modelling has become a vital tool for classifying academic literature, especially useful in scenarios where traditional techniques like manual reviews or bibliometric analysis fall short. The escalating volume of scholarly publications necessitates methods that can autonomously extract meaningful insights from extensive text corpora. Unlike labour-intensive manual reviews or bibliometric methods primarily focusing on citation networks, topic modelling leverages machine learning and natural language processing (NLP) to unearth latent themes, providing a scalable and nuanced approach to understanding research landscapes [17].

This technique is particularly beneficial in areas with substantial textual data, such as healthcare and social sciences, where it enhances the capability to parse and categorise large-scale information. In healthcare, for instance, topic modelling facilitates the identification of thematic clusters around specific diseases and treatments, offering a more efficient way to monitor medical advancements than traditional reviews could achieve [17]. In social sciences, it aids in dissecting vast datasets of academic papers and articles, tracking shifts in political ideologies, economic behaviours, and social movements.

Topic modelling enhances efficiency and scalability and delves deeper than bibliometric tools by analysing the content directly. It identifies trends and shifts across various disciplines, revealing emerging study areas and interdisciplinary connections that citation metrics might miss. Techniques like latent Dirichlet allocation (LDA) are prominent for their ability to model each document as a mixture of topics, thus clustering words that frequently co-occur to highlight predominant themes [35].

While latent Dirichlet allocation (LDA) is widely used, its reliance on a bag-of-words approach limits its ability to capture the contextual relationships between words. Alternatives such as latent semantic analysis (LSA) and non-negative matrix factorization (NMF) aim to address these limitations by revealing hidden semantic structures and reducing dimensionality. However, they also need help maintaining topic coherence and capturing more complex semantic relationships [36,37].

The evolution of data science has introduced more sophisticated models, such as BERTopic, which utilises transformer-based embeddings like BERT to generate context-aware text representations. This advancement allows for a deeper semantic analysis, making BERTopic particularly beneficial in fields that demand intricate semantic understanding, such as biomedical research and legal studies [6]. BERTopic distinguishes subtle differences in terminology and uncovers complex themes, providing a refined perspective on data that traditional methods cannot match. Furthermore, BERTopic’s use of transformer-based embeddings allows for deeper, context-aware analysis, making it especially valuable in fields requiring semantic depth, such as biomedical research, legal analysis, and research impact assessment [38].

The literature demonstrates the adaptability of BERTopic modelling across datasets of varying sizes, from small collections of a few documents to massive corpora containing millions of entries. Additionally, BERTopic is proven effective across different document lengths, making it suitable for long, complex texts and short, concise entries like tweets. This versatility allows it to extract meaningful insights from diverse data types, whether analysing focused datasets of a few hundred documents or larger corpora with millions of shorter records. For instance, studies analysing 144 articles on post-COVID-19 educational research [39] and around 1000 Reddit posts on patient experiences [40] and the Great Resignation [41], illustrate BERTopic’s ability to derive meaningful insights from focused, smaller datasets. In contrast, larger datasets, such as millions of tweets analysing online propaganda tactics [42] or clinical notes assessing fall risks in older adults [43], showcase BERTopic’s scalability for more extensive analyses. Mid-sized datasets, like those examining public engagement on energy prices through hundreds of thousands of tweets [44], balance comprehensive analysis and computational efficiency. This versatility highlights BERTopic’s effectiveness for qualitative and quantitative research across diverse fields, including healthcare, education, political science, and public opinion.

The literature synthesis highlights a significant shift in evaluating research impact, primarily driven by advancements in machine learning and artificial intelligence. This shift moves from traditional citation-based metrics to more holistic, data-driven approaches. Traditional metrics, although fundamental, often overlook the complex dimensions of research influence, particularly in anticipating future impacts. In response, the integration of ML and AI is revolutionising the field by forecasting research impact and dissecting its presence across various domains through sophisticated analytical techniques. Researchers can discern latent themes, intricate patterns, collaboration networks, and emerging trends within extensive datasets using advanced bibliometric analysis, citation burst analysis, and topic modelling. For instance, these techniques enable a detailed examination of the applications and influences of machine learning and artificial intelligence on the impact of research.

3. Methodology

This study employed a bibliometric analysis to systematically explore the influence of data science methodologies on various dimensions of research impact. The methodology described by Fahimnia et al. [45] was adopted, as illustrated in Figure 1. The methodology is structured into several key steps, each designed to ensure a comprehensive examination of the research landscape.

3.1. Data Collection and Preparation

This study’s data collection and preparation followed a systematic and rigorous approach to ensure the comprehensive retrieval, refinement, and relevance of literature for bibliometric analysis. The primary objective was to capture publications at the intersection of data science methodologies and research impact dimensions, forming a robust foundation for subsequent analyses. The methodology was designed to ensure that only high-quality, peer-reviewed research outputs were included, allowing for a meaningful and reliable assessment of how machine learning (ML) and artificial intelligence (AI) contribute to research impact evaluation.

3.1.1. Stage 1: Search Strategy

The Web of Science (WoS) Core Collection database was selected as the primary source of bibliometric data, given its extensive coverage of peer-reviewed literature, multidisciplinary indexing, and structured citation data, making it one of the most widely recognised and reliable sources for bibliometric research [46,47]. WoS provides longitudinal coverage from 1900 to the present, making it particularly suitable for tracking research impact trends over time. The use of WoS aligns with established bibliometric methodologies that have been widely validated in prior research [45].

The search was conducted on 31 August 2024 and was designed to capture publications explicitly addressing ML and AI applications in research impact analysis. The search strategy incorporated multiple layers of keyword filtering to achieve broad coverage without introducing excessive noise. The query terms were carefully selected based on pilot searches, expert input, and previous bibliometric studies and included core machine learning and AI methodologies, bibliometric techniques, and research evaluation frameworks. These terms were applied to titles, abstracts, and keywords to maximise recall while maintaining specificity (see Dataset 1 (https://github.com/mharsalan/ResearchImpact/blob/main/Dataset1.pdf, accessed on 21 March 2025) for the complete list of query terms).

The dataset was further refined based on publication type and language to enhance the quality and relevance of the retrieved records. The search results were restricted to English-language publications to maintain linguistic consistency and facilitate accurate text analysis. Furthermore, only journal articles, conference proceedings, and book chapters were included, as these publication types undergo rigorous peer review and represent substantial contributions to academic discourse. Grey literature, such as preprints and technical reports, was excluded to ensure methodological consistency and high data integrity. This rigorous filtering process yielded an initial dataset of 2706 records, ensuring broad yet high-quality coverage of the research landscape.

3.1.2. Stage 2: Results Screening and Refinement

The selection and screening of literature were crucial steps in refining the study’s focus and maintaining rigorous quality standards. Criteria were established to identify publications integrating machine learning and artificial intelligence technologies with research impact assessment.

ML and AI Inclusion Criteria: To qualify, studies were required to demonstrate significant use or development of ML and AI methodologies, such as linear regression, decision trees, and neural networks. This approach was deemed essential to ensure that the literature was technologically relevant and directly aligned with the analysis’s objectives.

Research Impact Evaluation Criteria: Publications were also required to explore various dimensions of research impact. This included traditional academic metrics, such as citation counts and the h-index, alongside broader impacts encompassing policy influence, market applications, and environmental effects. This dual-focus criteria ensured that the literature comprehensively addressed the multifaceted effects of ML and AI on research impact.

The search initially returned 2706 articles. These were meticulously screened to exclude duplicates, non-English content, and studies that failed to meet the thematic focus of ML/AI and research impact. The relevance of each article was first assessed by title, followed by a more detailed review of the abstract to confirm alignment with the study’s core themes.

A team of three independent researchers reviewed the titles and abstracts of the remaining publications to assess thematic alignment with the study objectives. Where necessary, full-text reviews were conducted to confirm relevance. To enhance consistency and reliability, discrepancies in article selection were discussed and resolved through consensus. If disagreements arose, an additional reviewer conducted a secondary assessment to ensure fair inclusion criteria were applied consistently. This methodical approach resulted in a final dataset of 1608 high-quality publications, providing a reliable foundation for bibliometric and computational analysis.

The structured and iterative data collection and screening approach ensured that the final dataset was comprehensive, methodologically sound, and aligned with the study’s research focus. This study maintained rigorous selection criteria and contributes to a reliable, high-quality bibliometric analysis of how ML and AI methodologies shape research impact evaluation (see Figure 1).

3.2. Bibliometric Analysis

The bibliometric analysis conducted in this study employed a comprehensive set of analytical techniques to systematically evaluate research impact and collaboration patterns within the domain of machine learning (ML) and artificial intelligence (AI) for research impact. Given the rapid expansion of these fields, bibliometric methods were selected to capture research productivity, intellectual structures, collaboration networks, and emerging trends. This section details the methodological choices and their underlying rationale, ensuring that the selected techniques provided comprehensive insights.

Productivity and General Data Analysis: Publication productivity and related metrics were assessed using the R package Bibliometrix (version 4.3.2 (with R version 4.4.2)), a well-established tool for large-scale bibliometric analysis. This foundational analysis provided quantitative insights into publication trends, journal distribution, and the productivity of authors and institutions. By analysing publication output over time, this study identified key contributors and prolific research hubs within ML and AI, shedding light on the field’s evolution. This approach was essential in establishing a baseline understanding of the research landscape, further guiding the in-depth network and citation analyses.

Bibliometric Network Analysis: The network-based approach in bibliometrics is instrumental in revealing the structural relationships and knowledge flow within a research domain. VOSviewer (version 1.6.20) was used to construct initial networks, focusing on co-authorship, co-occurrences, citation, co-citation, and bibliographic coupling. These techniques were selected because they provide complementary insights into collaboration structures, thematic relationships, and intellectual influences within the research community.

Co-authorship networks were used to map collaborative patterns among researchers, institutions, and countries, revealing key influencers and research clusters.
Co-occurrence networks were employed to analyse frequently appearing keywords, enabling the identification of emerging themes and dominant research areas.
Citation networks measured direct academic influence, helping to trace the impact of seminal papers.
Co-citation networks identified intellectual connections by linking papers frequently cited, reflecting conceptual relationships within the field.
Bibliographic coupling was used to group papers sharing standard references, highlighting emerging research directions and potential knowledge gaps.

Gephi 0.1 was used to refine further network attributes, such as degree centrality, betweenness centrality, and modularity clustering, to enhance the analytical depth. These measures enabled a granular exploration of collaborative dynamics and thematic structures, allowing for the identification of influential entities and emerging subfields. The network analysis was systematically organised as follows:

Researchers’ Analysis: Evaluated authors’ productivity and impact, mapping various networks among authors.
Institution Analysis: Assessed institutional productivity and explored collaborative networks to gauge the influence among institutions.
Country Analysis: Analysed productivity and network dynamics at the country level, providing insights into global research collaborations.
Publication Source Analysis: Investigated the roles of various journals and venues through productivity metrics and network analysis.
Document Analysis: Conducted a detailed analysis of documents to map citation networks and identify critical publications.
Keyword Analysis: Utilized co-occurrence analysis of authors’ keywords and KeyWords Plus to identify prevalent and emerging themes.

Citation Burst Analysis: To complement the network-based approach, a citation burst analysis was performed using CiteSpace (version 6.3.R3) to detect significant shifts in citation patterns over time. This method was chosen because it is particularly effective in identifying emerging research, disruptive innovations, and trends. Citation bursts indicate periods during which certain works experience a sudden surge in citations, often marking paradigm shifts or the introduction of novel methodologies. By analysing citation bursts, this study aimed to highlight transformative contributions that have shaped the ML and AI research landscape.

3.3. Topic Modelling

The analysis applied topic modelling techniques to explore a dataset of 1608 research articles at the intersection of data science and research impact dimensions. Topics within the text corpus were systematically categorised using BERTopic (version 0.16.4), incorporating BERT and other natural language processing (NLP) tools. BERTopic was selected over traditional topic modelling methods, such as latent Dirichlet allocation (LDA), latent semantic analysis (LSA), and non-negative matrix factorisation (NMF), due to its superior ability to generate context-aware topic representations using transformer-based embeddings [6,38]. Unlike LDA, which relies on a bag-of-words approach and assigns words to topics based on co-occurrence probabilities, BERTopic preserves semantic relationships between words, making it particularly effective for interdisciplinary research impact analysis [35]. While LSA and NMF attempt to improve topic coherence through dimensionality reduction techniques, they still face challenges in accurately capturing nuanced topic variations and require predefined topic numbers, limiting their flexibility in large, heterogeneous corpora [36,37].

BERTopic addresses these limitations by utilising transformer-based embeddings such as BERT to generate deep semantic representations of text, allowing it to detect emerging themes and complex patterns more effectively [6]. Additionally, it integrates UMAP for dimensionality reduction and HDBSCAN for clustering, eliminating the need for manual specification of the number of topics and making it more adaptable to evolving research landscapes. This approach enhances both scalability and interpretability, ensuring that the model can effectively process varying text lengths, from long-form research papers to shorter abstracts, tweets, and policy documents [42,43]. The model has been successfully applied across datasets of different sizes, demonstrating its effectiveness in both small, focused studies and large-scale bibliometric analyses [39,40,41]. Given its ability to handle diverse textual formats while maintaining high topic coherence, BERTopic was deemed the most suitable tool for analysing the thematic evolution of research impact literature in this study. The approach involved several stages, as follows.

Data Preprocessing: Titles, authors, and associated keywords were merged, and preprocessing was applied. The text was cleaned by removing special characters, standardising it to lowercase, and excluding common stop words. Lemmatisation was also conducted to treat different forms of a word as a single unit, ensuring consistency across the dataset.

Model Selection and Embedding: Various sentence-transformer models were evaluated for suitability based on metrics such as coherence Cv and coherence CUMass. Among the models tested, “gte-large-en-v1.5” was identified as the most effective due to its high coherence scores and low outlier percentage, indicating strong performance in semantic consistency.

Dimensionality Reduction and Clustering: UMAP was applied to reduce dimensionality, followed by clustering with the HDBSCAN algorithm to identify meaningful topic clusters. This method allowed us to group similar articles, effectively facilitating more explicit thematic analysis. The minimum cluster size was set to 10, ensuring that the smallest cluster comprised 10 research articles.

Representation: Advanced NLP techniques, including KeyBERT and maximal marginal relevance (MMR), were employed alongside the Llama-2 model to refine and visualise the topic representation. These tools helped in enhancing interpretability by providing a comprehensive and nuanced depiction of the topics.

Interpretation and Thematic Categorization: After identifying the topics, they were grouped into broader, thematically relevant categories related to research impact and ML/AI. This categorisation process was iteratively refined through collaborative discussions within the research team to ensure consistency, objectivity, and alignment with the study’s goals.

Implementation Details: The topic modelling pipeline was implemented using Google Colab’s GPU T4 infrastructure, employing the “meta-llama/Llama-2-7b-chat-hf” model and associated techniques for optimal performance.

4. Results

4.1. Analysis of Published Documents

The analysed dataset, spanning from 1967 to 2024, revealed significant trends in scientific production and citation growth. Publications have increased substantially since 2010, peaking at 245 documents in 2022 before slightly decreasing to 232 and 228, respectively, in the following two years. As illustrated in Figure 2, this growth reflects the expanding body of knowledge and heightened academic engagement in this field, especially over the past decade.

Citation data further underscore the academic influence of the field, with an average of 10.16 citations per document and 1.878 citations per year per document. The year 2021 marked a high point with 1746 citations, indicating that research impact extends beyond publication volume to include the quality and relevance of contributions. Other notable years include 2017, 2018, and 2020, with significant citation counts, highlighting periods of impactful research that continue to influence scholarly discourse.

The correlation between publication output and citation growth suggests an increase in the quantity and quality of contributions, reflecting the field’s dynamic nature and evolution over time. These findings emphasise the importance of research productivity and scholarly recognition in advancing knowledge, with periods of high publication output often preceding citation peaks, which are indicative of the long-term value of these contributions.

The analysis of the 1608 most cited documents related to the use of machine learning and artificial intelligence in research impact dimensions revealed significant trends in citation patterns and academic influence. The distribution of total citations (TCs) was markedly right-skewed, with a mean of 48.9 and a median of 16, highlighting that while many documents received moderate citations, a select few exerted substantial influence. For example, Harder VS (2010, Psychological Methods) [48] tops the list with 562 citations, reflecting the concentrated scholarly impact of key publications. Notably, as seen in Sheller MJ (2020, Scientific Reports) [49], the maximum annual citation rate reaches 92 citations per year, indicating the ongoing relevance of specific contributions.

The dataset primarily features recent publications, with a median age of 3 years and a mean age of 4.83 years. However, it also includes older, influential works such as Tang J (2009, KDD Conference) [50] and Bollen J (2009, PLoS ONE) [51], which continue to play a critical role in shaping the discourse on AI and machine learning in research evaluation. The blend of recent high-impact publications and enduring older works highlights the evolving and dynamic nature of research in this field. The analysis illustrates an uneven citation distribution, where a few seminal works on ML/AI-driven research impact dimensions dominate the scholarly conversation, significantly shaping current methodologies and research directions.

4.1.1. Document Citation Analysis

Figure 3 shows the citation network analysis using documents as the unit of analysis, revealing collaboration patterns among key entities measured through normalised citation scores and relationship weights. The network consists of nine distinct clusters varying in size and collaboration intensity. Cluster 1, the largest, with 87 members, indicates a dense and diverse collaboration network, while Cluster 9, with 25 members, represents a more focused group around niche topics. Clusters 6 and 5, with 33 and 43 members, exhibit moderate collaboration, reflecting varied, yet concentrated academic interactions. This diversity highlights different levels of collaborative intensity and thematic cohesion across the network.

Based on citation scores, the top 10 entities within these clusters show unique collaborative behaviours. In Cluster 1, entities like Li (2018b) [52], Kumar (2023) [53], Wamba (2023) [54], and Ezugwu (2021) [55] engage with members such as Golowko (2019) [56] and Lopez-Martinez (2020) [57] with equal-weight collaborations, maximising visibility and impact. Cluster 6, with Chen (2022a) [58], Huang (2023b) [59], and Lodge (2023) [60], features balanced collaborations with Kartal (2024) [61] and Guo (2024) [62], indicating a cohesive network that enhances members’ academic influence. In Cluster 9, Van Eck (2007) [63] collaborates with Bhattacharya (2019) [64] and Garcia-Sanchez (2019) [65], reflecting focused, high-impact interactions.

Individual Cited Documents: Individual analysis shows how these collaboration patterns translate into citation impact. Li (2018b) [52] leads with a citation score of 17.8, maintaining strong relationships with Golowko (2019) [56] and Taskin (2019) [66]. Kumar (2023) [53], with a score of 16.7, engages strategically with Wamba (2023) [54] and Akrami (2023) [67], contributing to his influence. In Cluster 5, Onan (2019) [68] scores 14.8 through balanced collaboration with Huang (2022b) [69] and Zhang (2018) [70]. In Cluster 6, Chen (2022a) [58] and Huang (2023b) [59] scored 14.0 and 13.7, respectively, expanding their networks through broad, evenly weighted collaborations with Guo (2024) [62] and Lodge (2023) [60]. Wamba (2023) [54] and Ezugwu (2021) [55] in Cluster 1, with scores of 13.1 and 12.3, maintain focused collaborations with Kumar (2023) [53] and Dalavi (2022) [71]. Lodge (2023) [60], scoring 11.5 in Cluster 6, engages specialised top and non-top entities, while Van Eck (2007) [63], with a score of 11.4 in Cluster 9, focuses on niche, specialised collaborations. Barbieri (2013) [72], scoring 10.4 in Cluster 2, maintains a stable collaborative approach, contributing to its academic impact.

4.1.2. Document Co-Citation Analysis

Figure 4 shows the co-citation network analysis using documents as the unit of analysis, revealing the intricate structure of scholarly communication through the clustering of key documents. The study identifies five clusters of co-cited documents, each representing thematic or methodological groupings. Cluster 1, the largest, with 91 members, has a broad interdisciplinary scope encompassing behavioural science, information metrics, and network analysis. Key documents include Hirsch JE, 2005 (Proceedings of the National Academy of Sciences of the United States of America) [8] with the highest collaboration score of 740, frequently co-cited with Egghe L, 2006 (Scientometrics) (weight 30) [73]. Other influential works, such as Ajzen I, 1991 (Organizational Behavior and Human Decision Processes) [74] and Bastian M, 2009 [75], further establish Cluster 1 as a central interdisciplinary research hub.

Cluster 2 consists of 87 publications and primarily centres around the themes of informetrics and bibliometrics. Documents like Van Eck NJ, 2010 (Scientometrics) [32] (score 551) and Donthu N, 2021 (Journal of Business Research) [76] (score 466) are central here. Their co-citations with works like Abramo G, 2019 (Journal of Informetrics) [77] and Aksnes DW, 2003 (Research Evaluation) [78] reflect advancements in bibliometric methods and visualisation. Van Eck NJ, 2010 [32] is co-cited with Aria M, 2017 (Journal of Informetrics) [79] (weight 32), indicating shared methodological innovations. Cluster 3, with 45 members, is centred on machine learning and data science, including Blei DM, 2003 (Journal of Machine Learning Research) [17] (score 739) and Griffiths TL, 2004 (Proceedings of the National Academy of Sciences of the United States of America) [80] (weight 39), highlighting their impact on probabilistic topic modelling. Other significant works include Bornmann L, 2015 (Journal of the Association for Information Science and Technology) [81] and Boyack KW 2011 (PLoS ONE) [82], reflecting a focus on data-driven methodologies.

Cluster 4, with 42 members, emphasises computational linguistics and informatics. Key documents include Teufel S, 2006 (Proceedings of the 2006 conference on empirical methods in natural language processing) [83] and Bonzi S, 1982 (Journal of the American Society for Information Science) [84], which focused on language processing and information retrieval. Teufel S’s work has a strong influence, with co-citations linking to foundational texts like Breiman L, 2001 (Machine Learning) [85]. Cluster 5, the smallest, with 35 members, represents a niche research area. While more specialised, these works are highly cited in specific contexts, indicating their focused, but significant contributions to their fields.

The top 10 nodes provide additional insights into the network’s structure. Hirsch JE, 2005 [8], with the highest collaboration score, shares strong co-citation ties with Egghe L, 2006 [73], reflecting their close scholarly dialogue. Similarly, Blei DM, 2003 [17] and Griffiths TL, 2004 [80] highlight their foundational roles in machine learning. Van Eck NJ, 2010’s [32] connections with Aria M, 2017 [79] and Donthu N, 2021 [76] illustrate the intersection of ideas within bibliometric research.

4.1.3. Document Bibliographic Coupling Analysis

Figure 5 shows the analysis of the bibliographic coupling network using documents as the unit of analysis, revealing a complex structure of research interconnections and thematic clusters based on shared citations. The network consists of five clusters, varying in size, suggesting diverse thematic focuses and research concentrations. Cluster 1 is the largest, with 130 members, indicating a broad research area with numerous interconnections, while Cluster 5, with 41 members, represents a more niche domain. Clusters 2, 3, and 4, with 97, 90, and 87 members, respectively, showcase substantial groups with defined research themes. Clusters 2 and 4 house most of the 10 most influential documents, identified based on their collaboration scores, indicating their centrality within their clusters.

Cluster 4, with 87 members, includes significant documents like Iqbal (2021) [86], Kilicoglu (2019) [87], and Zhu (2015a) [88]. Iqbal (2021) [86] leads with a collaboration score of 266, showing strong bibliographic coupling with Huang (2022a) [89], Jha (2017) [90], and Ihsan (2019) [91] (weight 14), indicating its pivotal role in shaping the cluster’s intellectual structure. Other notable members, like Wulff (2023) [92], Su (2019) [93], and Gowanlock (2013) [94], reinforce the thematic cohesion regarding shared research topics. Cluster 2, with 97 members, includes key documents like Basilio (2021) [95], Maphosa (2023) [96], Chen (2020b) [97], and Goodell (2022) [98]. Basilio (2021) [95], with a collaboration score of 264, shares a strong connection with Basilio (2022) [99] (weight 32), reflecting continuity in research themes. Other connections, such as with Choi (2020) [100] and Bittermann (2018) [101], emphasise its role in forming influential scholarly ties. Maphosa (2023) [96], with a collaboration score of 241, connects strongly with Chen (2022a) [58] and Huang (2023) [59], contributing to the cluster’s thematic depth.

Within these clusters, influential documents like Iqbal (2021) [86] and Basilio (2021) [95] stand out due to their high collaboration scores and strong ties with other members. Iqbal (2021) [86] in Cluster 4 shows strong connections with Huang (2022a) [89] and Jha (2017) [90], reinforcing its centrality. In Cluster 2, Basilio (2021) [95] shares substantial ties with Basilio (2022) [99] and other members, enhancing the cluster’s intellectual continuity. Chen (2020b) [97], with a collaboration score of 235, also shows solid bibliographic ties to works like Chen (2022c) [102] and Chen (2020a) [103], contributing to its unified research agenda. Other documents, such as Sharma (2023) [104], Goodell (2022) [98], and Ebadi (2020) [105], with collaboration scores ranging from 201 to 190, also exhibit significant bibliographic coupling. These documents maintain strong collaborative engagement within their clusters, contributing to the intellectual and thematic richness of the network.

4.1.4. Document Citation Burst Analysis

Figure 6 shows that the citation burst analysis using CiteSpace reveals significant patterns in applying bibliometric and machine learning methodologies to research impact assessment. Blei (2003) [17] leads with the strongest citation burst (14.2) from 2015 to 2021, highlighting the foundational role of latent Dirichlet allocation (LDA) in modern bibliometric analysis. Garfield (2006) [106] displayed a notable burst (5.8) from 2011 to 2021, reflecting the lasting influence of traditional citation indexing in current frameworks. Hirsch (2005) [8], with a burst of 7.8 from 2016 to 2019, underscored the ongoing significance of the h-index in quantifying research impact. Teufel (2006) [83] showed a burst of 5.5 between 2015 and 2022, emphasising the growing adoption of data-driven citation classification.

More recent bursts were noted for Thelwall (2013) [107] (6.4) and Fortunato (2018) [108] (6.1), both between 2019 and 2022, marking increased attention on alternative metrics and the science of science. Abrishami (2019) [5] (5.4) and Zupic (2015) [109] (6.5) had bursts from 2020 to 2024, reflecting the rising importance of AI in bibliometrics. Roberts (2019) [110] and Van Eck (2014) [111] each had bursts of 6.0 from 2022 to 2024, highlighting the integration of advanced statistical and bibliometric tools in assessing research impact.

4.2. Analysis of Publication Sources

The bibliometric dataset consists of 1608 publications across 990 unique sources, highlighting the diversity of research dissemination outlets. Table 1 shows that Scientometrics leads with 104 publications (6.5%), 1843 local citations, an h-index of 23, and 57,465 global citations with an h-index of 144, highlighting its central role in bibliometric research. Following Scientometrics, IEEE Access contributes 53 publications (3.3%), with 607 local citations and an h-index of 11, alongside a substantial global citation count of 916,390 and an h-index of 242, emphasising its importance in data science and AI research. The Journal of Informetrics further cements its influence with 31 publications (1.9%), 498 local citations, an h-index of 11, 17,095 global citations, and an h-index of 90.

Key journals like the Journal of Scientometric Research, Sustainability, and Education and Information Technologies contribute 12 to 16 publications, reflecting specialised impact. For instance, Technological Forecasting and Social Change contributed 12 publications (0.75%) with 281 local citations, an h-index of 8, and 114,767 global citations with an h-index of 179. Multidisciplinary journals such as IEEE Transactions on Engineering Management, Frontiers in Psychology, Expert Systems with Applications, and PLoS ONE contribute 8 and 11 publications each, highlighting their importance across diverse research domains.

Emerging journals like Quantitative Science Studies and Artificial Intelligence Review contribute fewer publications, but are influential in advancing scientometrics and AI research. Expert Systems with Applications and PLoS ONE stand out for their high global h-indices of 271 and 435, respectively, reflecting their significant global reach. The top 20 journals represent critical platforms for disseminating research on bibliometrics and ML/AI in research impact assessment. Their publication counts and citation metrics underscore their pivotal role in shaping discourse on advanced research evaluation methodologies.

4.2.1. Publication Citation Analysis

Figure 7 presents the local citation network analysis (within the database) using publications as the unit of analysis, revealing distinct clusters that emphasise thematic similarities and collaborative patterns among influential research entities. The network is divided into seven clusters, ranging from 6 to 30 members. Larger clusters suggest broader interdisciplinary research themes, while smaller ones indicate more specialised communities. Clusters, assessed through citation scores and relationship weights, offer insights into their influence and centrality within the network.

Cluster 1, the largest, with 30 members, focuses on bibliometrics, informetrics, and sustainability. Key entities include Scientometrics (citation score: 153.4), Journal of Informetrics (40.8), and Sustainable Development (26.7). Scientometrics maintains strong ties with the Journal of Informetrics (weight: 22) and the Journal of the Association for Information Science and Technology (weight: 11), reflecting shared research in scientific evaluation and citation analysis. Other contributors like AI Magazine, PLoS ONE, and Applied Soft Computing add to this cluster’s interdisciplinary nature, spanning informatics, AI, and computational sciences. Cluster 2, with 26 members, emphasises AI, educational technology, and computational research. Central publications include Educational Technology & Society (38.2), Education and Information Technologies (21.1), and Artificial Intelligence Review (19.9). Educational Technology & Society has strong connections with Education and Information Technologies (weight: 4) and the British Journal of Educational Technology (weight: 3), underscoring its role in integrating technology into education. Other key members include Complexity, Neural Computing and Applications, and Soft Computing, emphasising computational methodologies.

Cluster 3, with 13 members, focuses on technological management and information systems. Key publications include IEEE Access (54.9), IEEE Transactions on Engineering Management (35.9), and Information Systems Frontiers (48.0), reflecting strong engagement in technological innovation. IEEE Access has notable ties with Scientometrics (weight: 13) and Library Hi Tech (weight: 3), showcasing its interdisciplinary impact. Other members include Management Review Quarterly Sustainability and Technological Forecasting and Social Change. The top 10 publications across these clusters demonstrate varying influence based on citation scores and collaborative intensity. Scientometrics leads Cluster 1 with a citation score of 153.4 and strong ties to the Journal of Informetrics and IEEE Access. Similarly, IEEE Access in Cluster 3 (citation score: 54.9) exhibits interdisciplinary collaboration, particularly with Scientometrics and Library Hi Tech. Information Systems Frontiers (Cluster 3, citation score: 48.0) focuses on linking information systems with technological advancements.

The Journal of Informetrics (Cluster 1, citation score: 40.8) plays a crucial role, with ties to Scientometrics, PLoS ONE, and Research Evaluation, reinforcing its influence in scholarly impact studies. Educational Technology & Society (Cluster 2, citation score: 38.2) points to educational technology integration. Other influential entities like Technological Forecasting and Social Change (36.8), IEEE Transactions on Engineering Management (35.9), Sustainable Development (26.7), Education and Information Technologies (21.1), and Artificial Intelligence Review (19.9) reflect diverse research focuses and collaborative strengths within their respective clusters.

4.2.2. Publication Co-Citation Analysis

Figure 8 shows the co-citation network analysis using sources as the unit of analysis, revealing key insights into the structure and dynamics of research collaborations. Clustering entities based on co-citation patterns illustrates general and specific connections among research entities. Six distinct clusters were identified, varying in size and focus. Cluster 1, the largest, includes 83 members, while Cluster 6, the smallest, has nine members. These clusters reflect different research areas and levels of collaboration intensity. Collaboration scores indicate the total strength of links between entities, while weights measure the strength of individual co-citation relationships, with higher weights signifying stronger connections.

Cluster 1, featuring prominent entities like arXiv, Expert Systems with Applications, and IEEE Access, reflects an intense collaborative environment centred on AI, computational methods, and machine learning. Sources like Pattern Recognition and Neural Networks show strong co-citation patterns, with IEEE Access significantly contributing to cross-disciplinary research (collaboration score: 17.6). Cluster 3, including journals like Journal of the Association for Information Science and Technology and PLoS ONE, focuses on information science and open-access publishing. The Journal of Informetrics (collaboration score: 33.2) forms strong co-citation links with Scientometrics (weight: 505) and PLoS ONE (weight: 342), highlighting its central role in bibliometric and informetric research. Cluster 6, with Construction and Building Materials (collaboration score: 13.8), focuses on sustainable construction, with other members like Energy and Sustainability, reflecting a specialised, yet well-connected network in material sciences and engineering.

The top 10 entities are crucial nodes in their fields, each demonstrating varying levels of collaboration. For example, Scientometrics, the most collaborative entity with a collaboration score of 69.9, has extensive relationships with sources like Neural Networks (weight: 1421) and Advanced Neural Information Processing Systems (weight: 979), showcasing its cross-disciplinary impact. Similarly, the Journal of Informetrics has notable ties with PLoS ONE and IEEE Access, reinforcing its role in information metrics research.

4.2.3. Publication Bibliographic Coupling Analysis

Figure 9 shows the bibliographic coupling network analysis using sources as the unit of analysis, offering insights into collaboration dynamics and thematic clustering. The network is organised into three clusters characterised by unique collaborative patterns and research orientations. Cluster 1, the largest, comprises 97 members, indicating a highly dense and diverse collaborative environment. Cluster 2, with 66 members, represents a moderately sized group with significant interconnections, while Cluster 3, the smallest, with 33 members, reflects a more specialised research focus. The distribution reveals varying thematic coherence and collaboration intensity levels, with each cluster showing distinct research behaviour.

In Cluster 1, educational technologies, scientometrics, and applied sciences dominate. Key entities include Education and Information Technologies (collaboration score: 1156), Journal of Scientometric Research (1360), Kybernetes (1061), and Sustainability (1021). Education and Information Technologies collaborates with the 18th International Conference on Scientometrics and Informetrics, AI & Society, and Applied Sciences, reflecting its focus on educational research and scientometrics. The Journal of Scientometric Research connects with College and Undergraduate Libraries and Biochemistry and Molecular Biology Education, emphasising bibliometric research. Kybernetes collaborates with the Journal of Internet Technology and Education and Information Technologies, focusing on digital technologies. Sustainability links with Environmental Science and Pollution Research and Global Knowledge Memory, and Communication, highlighting its interdisciplinary approach.

Cluster 2 includes entities like IEEE Access (collaboration score: 4584), Journal of Informatics (3954), Journal of Information Science (1713), Quantitative Science Studies (1010), and Scientometrics (9058). IEEE Access collaborates with ACM Transactions on Knowledge Discovery from Data, Expert Systems with Applications, and Electronic Markets, reflecting its interdisciplinary appeal. The Journal of Informatics strongly ties with IEEE Transactions on Software Engineering and the Journal of Documentation. At the same time, the Journal of Information Science collaborates with Cognitive Computation and Applied Intelligence. Quantitative Science Studies is linked with PLoS ONE and IEEE Transactions on Learning Technologies. Scientometrics stands out for its wide-ranging collaborations, reinforcing its central role in the network.

Cluster 3, with 33 members, focuses on advanced technology and computational themes. The leading entity, Technological Forecasting and Social Change (collaboration score: 1057), collaborates with the British Journal of Educational Technology, Cognitive Computation, and Neural Computing and Applications, integrating social sciences with technological advancements.

4.3. Analysis of Researchers

The analysis of 4863 authors in research impact science reveals a long-tailed productivity distribution, with a few authors contributing significantly to total publications, as shown in Table 2. Xieling Chen is the most prolific, with 20 publications (0.35% of the dataset), followed by Haoran Xie and Di Zou, with 19 (0.33%) and 13 publications (0.24%), respectively. Saeed-Ul Hassan, Gary Cheng, and Naif Radi Aljohani are notable, with 11 and 8 publications highlighting varied contributions within the field, respectively.

Applying Lotka’s law, the β coefficient of 2.96 indicates a steeper-than-expected decline in author productivity. An R-squared value of 0.92 suggests a strong model fit, although a Kolmogorov–Smirnov test p-value of 0.046 points to deviations from this theoretical distribution. These findings suggest nuanced author productivity patterns unique to this field.

The impact of these authors is quantified by h-index and citation metrics. For instance, Haoran Xie demonstrates substantial scholarly influence with an h-index of 55 and 18,770 citations. In contrast, though less prolific, Lutz Bornmann and Wolfgang Glanzel show significant global recognition with high h-index values of 88 and 89 and citations over 27,000 each. On the emerging spectrum, Sukhwan Jung and Iqra Safder, who have lower citation counts and h-indexes, contribute to the field’s methodological diversity. Remarkably, Feng Xia, with only five publications, has amassed 21,276 citations and an h-index of 72, highlighting that impact transcends publication frequency.

4.3.1. Researchers’ Co-Authorship Analysis

The co-authorship network analysis revealed 1082 distinct groups, each representing a cluster of collaborative authors within the larger academic field. These groups vary considerably in size, with an average of 4.44 authors (SD = 3.91) and a median of 3, reflecting predominately small clusters. The smallest group comprised only 2 authors, whereas the largest encompassed 51, illustrating a highly skewed distribution of group sizes (skewness = 5.62, kurtosis = 47.44).

The centrality analysis showed that the most central author in each group had an average degree of 3.28 (SD = 3.12), with degrees ranging from 1 to 50. This highlights a few highly connected authors amidst many with fewer connections, evidenced by the high degree of skewness (5.13) and kurtosis (53.92). A quartile analysis indicated that 25% of groups had a central author with just one connection, whereas 75% had up to four, suggesting a fragmented network with weakly and more cohesively connected clusters.

Figure 10 shows the networks of the top 20 authors, highlighting the number of collaborators in each network group and revealing the breadth of their collaborations. For instance, Zhang Yi leads Group 17 as the most central figure. Hassan Saeed-Ul is pivotal in Group 2, alongside other notable researchers such as Naif Radi Aljohani, Lutz Bornmann, and Iqra Safder, making it a key hub for collaboration. In Group 1, Chen Xieling plays a central role, facilitating significant interactions among top authors like Xie Haoran and Zou Di.

Additional central figures include Afzal Muhammad Tanvir in Group 3, Mayr Philipp in Group 4, and Glanzel Wolfgang in Group 5, each leading their respective groups. This analysis underscores the varied, yet profound influence of productive authors like Zhang Yi, Hassan Saeed-Ul, and Chen Xieling, who drive extensive networks that significantly impact the collaborative landscape of the field.

In Figure 11, the authors’ co-authorship collaboration network highlights 55 major clusters from 1082 distinct groups of researchers collaborating within the field. These clusters vary in size, from 37 to 2 members, reflecting a range of collaboration dynamics. Larger clusters suggest broader, integrated research communities, while smaller ones often represent specialised or niche areas.

Chen Xieling, with a collaboration score of 75.0, is central in Cluster 9, which includes 26 members. Significant collaborators, such as Haoran Xie and Gary Cheng, share strong ties with Chen, weighing 19.0 and 11.0, respectively. With a collaboration score of 47.0, Gary Cheng also plays a crucial role in maintaining Cluster 9’s activity alongside Chen and Xie. Similarly, Di Zou, with a collaboration score of 45.0, has strong ties with Chen (weight: 13.0) and Cheng (weight: 9.0), reinforcing his significant contributions to this cluster.

Naif Radi Aljohani, with a collaboration score of 34.0, is a central figure in Cluster 3, a larger group of 34 members that includes Saeed-ul Hassan and Raheel Nawaz. Strong co-authorship ties exist between Aljohani and Hassan (weight: 8.0) and Aljohani and Faisal Kamiran (weight: 3.0), highlighting the cluster’s interconnected nature. With a collaboration score of 45.0, Saeed-ul Hassan also holds a central position in Cluster 3, further emphasising their importance in the network. Chen Chuan-Chiang, from Cluster 17, with a score of 28.0, collaborates with Kathy Schmidt Jackson and Firas Akasheh in a specialised group of 17 members, reflecting a focused research community with impactful co-authorships.

4.3.2. Researchers’ Citation Analysis

Figure 12 shows the network divided into seven clusters. Cluster 1, the largest, has 35 members, representing a densely connected group likely engaged in interdisciplinary collaborations. Cluster 7, the smallest with 13 members, may reflect a more specialised focus. Mid-sized clusters, like Cluster 4, with 25 members, suggest balanced collaboration efforts. Larger clusters may indicate interdisciplinary activity, while smaller ones imply concentrated research areas.

Cluster 4 is the most influential, containing four of the 10 most cited authors. Chen Xieling (citation score: 74.8), the highest-ranked, has strong co-authorship ties with Xie Haoran (weight: 67), Zou Di (41), and Cheng Gary (32). Chen also maintains moderate ties with Wang Fu Lee (24) and Hao Tianyong (18), reinforcing Chen’s central role. Xie Haoran (citation score: 71.0) mirrors Chen’s network, showing robust relationships with Zou Di (39) and Cheng Gary (30) and an additional tie to Huang Xinyi (6). This interconnectedness highlights Cluster 4 as a critical node in the citation network, uniting top researchers for impactful output. Zou Di (citation score: 57.3) collaborates with Cheng Gary (22) and Wang Fu Lee (9) and has ties to Huang Xinyi (6). Cheng Gary (citation score: 43.7) plays a pivotal role, with partnerships including Wang Fu Lee (9) and Hao Tianyong (6), contributing to the cluster’s cohesion despite fewer high-weight collaborations.

Cluster 1 includes Muhuri Pranab K. and Shukla Amit K. (citation score: 18.9 each), whose focused partnership (weight: 2) and ties with Ricardo Arencibia-Jorge and Humberto Carrillo-Calvet (1 each) indicate specialised research. Though more significant, this cluster’s collaborations are less dense than those in Cluster 4. Cluster 3, with 29 members, features Zhang Yi (citation score: 11.9), with moderate collaborations with Chen Hongshu, Huang Lu, and Lu Jie (2 each). While smaller in citation impact, this cluster focuses on specific academic fields.

4.3.3. Researchers’ Co-Citation Analysis

Figure 13 illustrates the co-citation network analysis using authors as the unit of analysis, revealing distinct patterns of collaboration and influence among scholars in the field. The network comprises four clusters, each representing groups of authors with interconnected co-citation relationships. Cluster 1, the largest, includes 120 members, representing a broad, interconnected network spanning diverse fields like psychology, information science, and machine learning. Cluster 2 contains 88 members, primarily focused on bibliometrics, research evaluation, and science mapping. Cluster 3, with 39 members, is medium-sized, while Cluster 4, with 36 members, represents a specialised community focused on information visualisation and scientometrics.

Bornmann, L., with the highest collaboration score of 3907, leads the network from Cluster 2. His strong ties with Waltman, L (weight: 198), Gigerenzer, G (185), and Marewski, JN (168) emphasise his role in bibliometrics and research evaluation. His collaborations with influential figures, like Abramo, G. and Adams, J, reinforce his diverse research network. Van Eck, NJ, also in Cluster 2, follows with a collaboration score of 2045. His substantial collaborations with Cobo, MJ (weight: 73) and Waltman, L (48) highlight his role in bibliometric tools like VOSviewer. He also interacts with Blei, DM (46) and Zupic, I (35), reflecting his cross-disciplinary engagement. Leydesdorff, L, also in Cluster 2 with a collaboration score of 1584, has contributed significantly to knowledge flows in scientific research, collaborating with Moed, HF and Egghe, L.

Blei, DM, from Cluster 1, with a collaboration score of 1889, is noted for his work on latent Dirichlet allocation (LDA). His strong ties with Griffiths, TL (weight: 82) and Zhang, Y (65) highlight his central role in machine learning. His collaborations with Chen, H and Borgatti, SP underscore his interdisciplinary impact. Garfield, E (Cluster 1), with a collaboration score of 1869, has strong ties with Hirsch, JE (79) and Small, H (62), reflecting his foundational influence in citation indexing and scientometrics. Hirsch, JE (Cluster 1) has a collaboration score of 1676 and is known for the h-index. His strong collaborations with Waltman, L (44) and Chen, XL (34) underscore his impact on research evaluation. His ties with Schreiber, M (27) and Wang, MY (20) further emphasise his influence in scientometrics.

In Cluster 4, Moed, HF (collaboration score: 1429) plays a significant role in research assessment, with strong ties to Boyack, KW and Fortunato, S, contributing to scientific knowledge visualisation. Small, H (Cluster 4, collaboration score: 1369) is known for developing co-citation analysis, with links to key figures like Chen, XL and Egghe, L, highlighting his influence in knowledge visualisation. Chen, C (Cluster 4, collaboration score: 1322) is recognised for his work in information visualisation, mainly through CiteSpace. His strong ties with Boyack, KW (weight: 33) and Fortunato, S (17) emphasise his contributions to visualising scientific domains. Zitt, M (Cluster 4, collaboration score: 1275) rounds out the top 10, known for his work on research performance indicators and his collaborations with Chang, YW and Egghe, L.

4.3.4. Researchers’ Bibliographic Coupling Analysis

Figure 14 illustrates the bibliographic coupling network analysis, which examines collaborative structures among authors by identifying clusters sharing bibliographic references or co-authorship patterns. The network is divided into six distinct clusters, each representing groups of authors with shared research themes or frequent collaborations. Clusters vary significantly, from Cluster 1, the largest with 30 members, to Cluster 6, the smallest with seven members. This variation reflects differing levels of collaboration intensity, with larger clusters representing broader, interdisciplinary research areas while smaller clusters focus on more specialised topics.

Cluster 1, the most extensive and most interconnected, includes prominent authors like Chen Xieling, Xie Haoran, Ayaz Ahmet, Aytac Orhan, Wang Fu Lee, and Liu Xiaohong. Chen Xieling holds the highest collaboration score (7987), with strong ties to Xie Haoran (weight: 2637) and Zou Di (weight: 1442), indicating extensive collaborative networks. Xie Haoran (collaboration score: 7822) also has significant ties with Zou Di (1430) and Cheng Gary (1306), emphasising the influential nature of collaborations in Cluster 1. Cluster 4, with 16 members, represents a more focused research theme, featuring top authors like Alelyani Salem, Alhoori Hamed, Naif Radi Aljohani, Bornmann Lutz, Xu Zeshui, and Zhang Yi. Zou Di’s strong collaboration with Cheng Gary (weight: 930) highlights selective partnerships based on thematic focus. Cheng Gary (collaboration score: 4652) also collaborates with Wang Fu Lee (weight: 369) and Huang Xinyi (351), demonstrating intra-cluster and broader collaborative efforts.

Relationship analysis shows how the top 10 authors are positioned within these clusters. Chen Xieling and Xie Haoran, central figures in Cluster 1, have balanced in-degree and out-degree statistics, indicating their dual roles as influencers and active collaborators. Xie, for instance, has an in-degree of 70 and an out-degree of 4, illustrating selective outbound collaborations. Liu Xiaohong, with a collaboration score of 3154, bridges different authors like Wang Fu Lee (240) and Zhang Yi (110), promoting cross-disciplinary research. In Cluster 4, authors like Alelyani Salem and Alhoori Hamed maintain strong, focused networks. Zhang Yingying (collaboration score: 2926) strategically partners with Xu Zeshui (weight: 500), while Tang Ruiming (collaboration score: 2817) shows targeted connections, particularly with Zhao Xiang (weight: 170). Authors like Zhao Xiang (collaboration score: 2772) and Li Hanzhou (collaboration score: 2734) maintain focused networks, with their mutual tie (weight: 150) indicating a preference for depth over breadth in collaborative efforts. This network analysis reveals diverse strategies among top authors, with Cluster 1 functioning as a broad, interdisciplinary hub, while Cluster 4 is more specialised.

4.4. Analysis of Institutions

The institutional productivity analysis highlights the contributions of 1684 global institutions, reflecting the breadth of collaboration in this domain. The Education University of Hong Kong leads with 34 publications (0.84%), showcasing its pivotal role in advancing research in education and policy. The University of California system and Wuhan University are closely followed, with 26 publications (0.64%), reflecting their significant output in education, data science, and policy studies.

Other notable contributors include the University of Granada, with 22 publications (0.54%), and KU Leuven, with 19 (0.47%), maintaining a solid presence in European social sciences and technology research. Tsinghua University and the Pennsylvania Commonwealth System of Higher Education (PCSHE), also with 19 publications (0.47%), emphasise the influence of Chinese and North American institutions in shaping global collaborations. Lingnan University and the University of London, each with 18 publications (0.44%), highlight the global impact of educational research in Greater China and Europe. In contrast, the Chinese Academy of Sciences, with 17 publications (0.42%), underscores China’s investment in scientific research.

Harvard University and Nanyang Technological University each contributed 16 publications (0.39%), showcasing sustained productivity across multiple fields, from science and technology to social sciences. King Abdulaziz University and the National Institutes of Health (NIH)—USA, with 15 publications (0.37%), reflect the growing importance of health-related research and collaborations between the Middle East and the US. Columbia University, Leiden University, and the University of Texas System, each with 14 publications (0.34%), highlight the global distribution of research productivity.

4.4.1. Institutions’ Co-Authorship Analysis

Figure 15 illustrates the co-authorship network analysis, revealing nine distinct clusters of institutions with cluster sizes ranging from 65 to 122 members. These clusters represent various collaborative communities, with larger ones like Cluster 1 (122 members) indicating broad co-authorship networks and smaller clusters reflecting more specialised academic collaborations.

The Education University of Hong Kong leads with a collaboration score of 48.0 in Cluster 8, which includes 69 members. Its strongest partnerships are with Lingnan University (weight: 18) and South China Normal University (weight: 5), highlighting significant co-authorship within Greater China, particularly in educational and policy research. University College London (UCL) follows with a collaboration score of 43.0 in Cluster 2 (120 members), with key links to Queen’s University (weight: 2) and the University of Sargodha (weight: 1). UCL’s diverse global collaborations across Europe, Asia, and North America emphasise its influence in fostering international academic partnerships.

The University of Oxford, with a score of 42.0, is part of Cluster 7 (83 members), collaborating with institutions like Stanford University and Harvard University (both weights: 1). These ties reflect Oxford’s role in a prestigious global research network, advancing knowledge exchange among top institutions. Lingnan University, with a score of 38.0, shares Cluster 8 with the Education University of Hong Kong, collaborating with Guangzhou University and Shenzhen University (weights: 4 and 3). Its link to the Education University of Hong Kong (weight: 18) underscores its active role in the Hong Kong–China educational research network. King Abdulaziz University (score: 37.0) and King Khalid University, both in Cluster 1 (122 members), have key partnerships with regional institutions such as Information Technology University (weight: 8) and Effat University (weight: 1), emphasising their central role in academic collaborations across the Middle East, particularly in technology and applied sciences.

4.4.2. Institutions’ Citation Analysis

Figure 16 shows the citation network analysis of organisations, revealing insights into collaboration patterns, research impact, and institutional clusters. Seven distinct clusters were identified, ranging from Cluster 1 with 76 members to Cluster 7 with 29. Larger clusters represent broader networks, while smaller ones may indicate more specialised research communities. The Education University of Hong Kong leads with a citation score of 80.7 in Cluster 6 (30 members). Notable collaborators include Lingnan University (citation score: 65.4) and the Chinese Academy of Agricultural Sciences. Strong ties between the Education University of Hong Kong and Lingnan University (weight: 53) and South China Normal University (weight: 20) reflect intense regional collaboration and academic leadership. Lingnan University also maintains key partnerships with South China Normal University (weight: 18) and Jinan University (weight: 13), reinforcing its significant academic influence within the cluster.

Swinburne University of Technology (citation score: 40.4) is part of Cluster 3 (59 members). Key collaborators include Banaras Hindu University (citation score: 35.2) and South Asian University (citation score: 27.0). While Swinburne has fewer collaborations, its ties with institutions like Universidade Presbiteriana Mackenzie and Yeungnam University (weight: 1 each) show focused, high-quality partnerships. Banaras Hindu University’s moderate but impactful connections to South Asian University (weight: 2) highlight its regional and international influence. Tsinghua University (citation score: 29.3) is in Cluster 1, the largest and most diverse. Notable collaborators include the Beijing Academy of Artificial Intelligence, Monash University, and Georgia Institute of Technology, with strong ties to the Dalian University of Technology (weight: 2). The University of Oxford (citation score: 31.0), in Cluster 5 (38 members), maintains significant partnerships with institutions like Harvard, Stanford, and Imperial College London (weight: 2 with Imperial), underscoring its global research impact. Shanghai University (citation score: 29.8) and the University of Pennsylvania (citation score: 34.0) are part of Cluster 2 (67 members), alongside the Max Planck Society, Aligarh Muslim University, and King Khalid University. Despite Shanghai University’s modest collaboration weights (1 with Chandigarh University and King Khalid University), its presence in such a large cluster highlights its academic influence.

4.4.3. Institutions’ Bibliographic Coupling Analysis

Figure 17 shows the bibliographic coupling network analysis using organisations as units of analysis. It reveals five distinct clusters that reflect varying levels of academic collaboration based on shared references in scholarly publications. These clusters vary in size, with Cluster 1 being the largest (75 members) and Cluster 5 the smallest (30 members). This variation suggests that some clusters have broader institutional connections with shared research interests while others are smaller and more specialised.

Cluster 1 includes globally recognised institutions like Beihang University, the Chinese Academy of Sciences, and the University of São Paulo, reflecting strong technology, engineering, and science networks. Cluster 2, with 65 members, features institutions such as Education University Hong Kong, Lingnan University, and South China Normal University, highlighting regional collaboration in educational and social sciences. Cluster 3 (45 members) includes Dalian University of Technology, Kyushu University, and University of California Berkeley, focusing on technology and interdisciplinary research. Clusters 4 and 5, the smallest, include institutions like King Abdulaziz University, King Khalid University, and Manchester Metropolitan University, highlighting international collaborations across disciplines.

The top 10 institutions within these clusters, ranked by collaboration scores, show varying degrees of influence. The Education University of Hong Kong (Cluster 2) leads with a score of 8460, forming strong ties with Lingnan University (weight: 2566) and Open University Hong Kong (weight: 846), underscoring its regional leadership. Lingnan University (Cluster 2), scoring 7371, has vital partnerships with Open University Hong Kong (weight: 771) and the University of Southern Queensland (weight: 553), enhancing its network position. King Abdulaziz University (Cluster 5) ranks third with a score of 4585, showing collaborations with Information Technology University (weight: 800) and Manchester Metropolitan University (weight: 574), reflecting its interdisciplinary and international focus. Information Technology University (Cluster 5) follows with a score of 3355, marked by solid links to Manchester Metropolitan University (weight: 514) and King Khalid University (weight: 308), focusing on technology research.

Manchester Metropolitan University (Cluster 5), scoring 3159, maintains strategic partnerships with King Khalid University (weight: 231) and Max Planck Gesellschaft (weight: 167), reflecting its diverse research approach. Open University Hong Kong (Cluster 2), with a score of 3130, emphasises regional education collaboration with the University of Southern Queensland (weight: 465) and South China Normal University (weight: 268). South China Normal University (Cluster 2), with a score of 2910, has strong ties with the University of Southern Queensland (weight: 171) and Dalian University of Technology (weight: 81), focusing on international exchange. King Khalid University (Cluster 5), scoring 2766, maintains partnerships with the University of Belgrade (weight: 229) and Korea University (weight: 228), emphasising its influence in the humanities. Dalian University of Technology (Cluster 3), with a score of 2760, engages internationally with the University of South Australia (weight: 128) and Peking University (weight: 97), positioning it as a key player in technology and engineering research. This analysis illustrates the intricate web of collaborations among top institutions, each contributing uniquely to the global academic landscape through their research networks.

4.5. Analysis of Countries

The analysis of country-level contributions to research impact science highlights notable global participation, with China the leading contributor. Table 3 shows that China produced 446 publications, accounting for 28.3% of the total output, highlighting its strategic investment in research, especially in data science and artificial intelligence. The United States follows with 240 publications (15.3%), showcasing its sustained leadership and commitment to advancing research impact methodologies. With 108 publications (6.9%), India underscores its growing role in the global research landscape, driven by investments in AI and data-driven research. Germany and the United Kingdom contribute 65 (4.1%) and 55 (3.5%) publications, respectively, reflecting their established research infrastructures and contributions to research impact assessment.

Figure 18 illustrates the publication trends of the top five contributing countries. China’s post-2015 growth is striking, with a 41% average annual growth rate, while India and Germany show strong growth rates of 43% and 54%, respectively. The United States (22%) and the United Kingdom (39%) show steady, albeit slower growth, maintaining their central roles in advancing the field. Spain (51 publications, 3.2%), Australia (44 publications, 2.8%), and Canada (37 publications, 2.4%) demonstrate active engagement in advancing research impact methodologies. Other contributors include France and South Korea (31 publications, 2.0% each), Pakistan (30 publications, 1.9%), Brazil (27 publications, 1.7%), Italy (26 publications, 1.7%), the Netherlands (24 publications, 1.5%), and Russia (22 publications, 1.4%). Japan (21 publications, 1.3%), Iran (18 publications, 1.1%), Romania and Singapore (17 publications, 1.1% each), and Indonesia (14 publications, 0.9%) further emphasise the global reach of research impact science. These contributions highlight the growing international focus on using data science and AI to improve research impact evaluation, moving beyond traditional metrics.

4.5.1. Countries’ Co-Authorship Analysis

Figure 19 shows the co-authorship network analysis, identifying four distinct clusters with unique collaboration dynamics and varying interconnectedness. Cluster sizes range from 32 countries in Cluster 1 to 12 countries in Cluster 4. The composition of these clusters reflects different levels of collaboration intensity and strategic focus, which are further detailed by the 10 countries with the highest collaboration scores, indicating cumulative collaboration strength.

Cluster 1, the largest with 32 members, includes major global research players like the USA, Canada, Mexico, Argentina, South Korea, and Germany. The USA leads this cluster and the network with a collaboration score of 200, marked by strong ties with Canada, Germany, and Australia (weights of 12 each). Other notable collaborations, like those with Mexico (weight: 4) and South Korea (weight: 1), emphasise the USA’s central role in global research, supported by 320 documents and 5156 citations. With a score of 75, Germany plays a pivotal role, collaborating with England and France (weights of 4 each). Canada, scoring 70, maintains strong links with the USA and England, with 91 documents and 1116 citations, while Italy, scoring 64, fosters European collaborations with Germany and England (weights: 4 and 3). Cluster 2, with 22 members, features a diverse mix of countries, including Brazil, England, Iran, Iraq, and Saudi Arabia. England leads this cluster with a score of 131, marked by strong ties with Australia (weight: 8) and Canada (weight: 5), supported by 91 documents and 929 citations. Saudi Arabia, scoring 83, focuses on bilateral ties with Pakistan (weight: 17), while France, with a score of 73, balances regional and global collaborations, including relations with China and Canada (weights: 3 each).

Cluster 3 includes 20 members, featuring China, Australia, Azerbaijan, and Croatia. China leads with a score of 120, maintaining strong collaborations with Australia (weight: 17), Germany, India, and Pakistan, contributing 469 documents and 4166 citations. Australia, scoring 82, engages actively with England (weight: 8) and Canada, supported by 68 documents and 1065 citations. India, scoring 62, rounds out the cluster with collaborations with China and Saudi Arabia, contributing 48 documents and 1141 citations. Cluster 4, the smallest with 12 members, focuses on specialised research areas. While it includes fewer top-ranking countries, its targeted collaborations likely reflect niche research efforts that warrant further investigation. This cluster analysis highlights how countries strategically position themselves within the global research landscape, leveraging networks to optimise collaboration intensity and scientific impact. Each cluster’s unique composition, alongside the top 10 countries’ varying roles, underscores their contributions to the co-authorship network.

4.5.2. Countries’ Citation Analysis

Figure 20 presents the citation network analysis, offering a detailed view of collaborative research relationships between countries, measured through clusters, weights, and scores. The network is divided into five clusters, with Cluster 1 and Cluster 2 being the largest (26 members each) and Cluster 5 the smallest (7 members). These clusters reflect different interaction patterns, from broad, interconnected networks to more specialised collaborations.

In Cluster 2, China leads, with a citation score of 437.7. Its strong partnerships with the USA (weight 44), Australia (weight 33), and India (weight 21) highlight its global research impact, further supported by ties with Iran (weight 18) and Spain (weight 15). The USA, with a citation score of 401.0, plays a crucial role in Cluster 2, fostering high-impact collaborations with England (weight 19) and Germany (weight 18), as well as connections with Australia (weight 15) and Brazil (weight 7), showcasing a diverse research network.

India, in Cluster 1, has a citation score of 175.2 and strong ties with Mexico (weight 11), Canada (weight 5), and Pakistan (weight 5). India’s network, extending to Russia and Italy, emphasises its growing influence across Asia and Europe. In Cluster 3, England and Canada, despite being part of a smaller group (10 members), maintain critical roles. England, with a citation score of 137.1, has strong links with Pakistan (weight 19), Saudi Arabia (weight 14), and Australia (weight 9). Canada, scoring 58.8, strengthens the cluster through collaborations with Saudi Arabia (weight 12) and Pakistan (weight 8). In Cluster 5 (the smallest cluster), Germany has a citation score of 95.0 and maintains close research ties with France, Switzerland, and Austria. This smaller cluster is highly focused and productive, contributing to Europe’s academic output. Australia, also in Cluster 2, has a citation score of 136.0 and significant connections with Pakistan (weight 6), Canada (weight 5), and Germany (weight 4). Its collaborations with Iran and South Korea (weight 3 each) further enhance its influence in the Pacific and European academic networks.

4.5.3. Countries’ Bibliographic Coupling Analysis

In Figure 21, the bibliographic coupling network analysis reveals five distinct clusters reflecting countries’ collaborative landscapes. These clusters represent countries engaged in co-authorship and academic collaboration. The clusters vary, from Cluster 1 with 35 members to Cluster 5 with 11 members. Larger clusters suggest broader and more interconnected research collaborations, while smaller clusters indicate more specialised areas of cooperation.

Cluster 1, the largest, comprises 35 countries, including the United States (USA), Germany, Australia, Russia, and China. China leads with the highest collaboration score of 49,058, reflecting strong partnerships with the USA (weight: 7234), Australia (weight: 3966), and India (weight: 3646). The USA, with a collaboration score of 39,316, has robust ties with India (2550), Germany (2320), and Australia (2249). Germany (score: 19,922) maintains strong relationships with England (993) and Australia (924), while Australia (score: 19,289) emphasises partnerships with England (1016) and Canada (529). Cluster 2, with 22 members, includes countries such as Brazil, Italy, Mexico, Poland, and Spain. Brazil (score: 13,964) plays a crucial role, while Italy (score: 10,381) maintains strategic ties with Malaysia and Portugal. Spain (score: 14,369) has strong collaborations with Italy (581) and Turkey (500), contributing to a diverse mix of regional networks. Turkey and Poland further enhance Cluster 2’s collaborative dynamics.

Cluster 3, with 12 members, focuses on Asian countries such as India, South Korea, Taiwan, and Malaysia. India leads with a score of 26,624 and key collaborations with Australia (1540) and Germany (1205). South Korea (score: 12,089) strengthens the cluster through collaborations with Saudi Arabia (463) and Spain (367). Taiwan and Malaysia contribute to the cluster’s cohesion through active co-authorship networks. Cluster 4, with 12 members, includes countries such as England, Pakistan, Saudi Arabia, and the UAE. England leads with a score of 18,781, showing strong ties with Saudi Arabia (1281) and Pakistan (1073). Saudi Arabia (score: 14,599) connects with Pakistan (1804) and South Korea (463), demonstrating the cluster’s diverse research partnerships. Scotland and Ireland further support the cluster’s collaborative efforts. Cluster 5, the smallest with 11 members, focuses on specialised research areas. Though less prominent, it includes countries with focused collaborations often related to niche areas or specific regional emphases.

4.6. Analysis of Research

4.6.1. Authors’ Keyword Analysis

Figure 22 shows the author keyword co-occurrence network analysis across scholarly articles, identifying three principal clusters encapsulating key research themes in integrating machine learning and artificial intelligence in research impact. The largest cluster contains 176 keywords, dominated by computational methods such as ‘machine learning’ and ‘text mining’. The smallest cluster, with 107 keywords, likely addresses more specialised or emerging research areas.

Using collaboration scores—quantitative measures of total link strengths—this analysis highlights academic influence and interconnectivity. ‘Machine learning’ leads with a collaboration score of 601.0, strongly linking with ‘bioinformatics’ (weight 8.0) and ‘software engineering’ (weight 38.0), underscoring interdisciplinary collaboration. Similarly, ‘artificial intelligence’ scores 582.0 and connects significantly with ‘robotics’ and ‘software engineering’, showcasing its broad influence on research methodologies.

The network is dense in Cluster 1, where ‘machine learning’ and ‘artificial intelligence’ reside. ‘Natural language processing’ (score 199.0) and ‘text mining’ (162.0) have strong ties to ‘bioinformatics’ (weight 10.0) and ‘topic analysis’ (weight 13.0), illustrating AI’s role in processing complex datasets. ‘Bibliometrics’ (525.0) and ‘bibliometric analysis’ (343.0) frequently connect with ‘software engineering’ (weights of 39.0 and 49.0), highlighting the role of technological advancements in refining scientific mapping and research trend analysis.

4.6.2. Associated Keyword Analysis

The analysis extends to how specific keywords interlink. For instance, ‘deep learning’ (score 222.0) interacts with ‘citation count prediction’ (weight 9.0), emphasising its predictive role in academic trends. ‘Topic modelling’ (202.0) connects with ‘collaboration networks’ (weight 16.0), which helps uncover patterns in scholarly publications that inform policy and funding. This analysis quantifies relationships through collaboration scores and link weights, shedding light on the computational themes and complex networks that drive scholarly communication.

Figure 23 comprehensively analyses machine learning and artificial intelligence applications in research impact, revealing three distinct clusters with unique collaboration dynamics and thematic focuses. The largest cluster, with 145 entities, reflects a broad approach to integrating AI across diverse research domains. The second cluster, with 95 members, emphasises practical AI implementations, while the smallest cluster, with 82 members, focuses on specialised methodological advancements.

In the largest cluster, ‘science’ leads with a collaboration score of 495, serving as a hub for interdisciplinary research leveraging AI. ‘Impact’ (score 480) and ‘performance’ (193) further highlight this cluster’s role in enhancing research dissemination and optimising outputs through advanced analytics. Other significant members, such as ‘evolution’ and ‘knowledge’, link foundational research themes with emerging AI technologies, interacting with entities like ‘AI’ and ‘algorithms’. In the second cluster, ‘management’ (181) and ‘systems’ (131) underscore AI’s role in research management. Their connections with ‘AI’ and ‘adoption’ highlight the practical application of AI tools in managing complex research projects. The third cluster features ‘classification’ (130) and ‘model’ (156), focusing on technical rigour in AI-driven research methodologies. Their strong ties to ‘analytics’ and ‘artificial intelligence’ reflect their role in predictive modelling and data classification. This analysis illustrates the diversity of AI applications in research and highlights the distinct roles of top entities within each cluster.

4.6.3. Research Topic Analysis

The topic modelling effort on machine learning and artificial intelligence for research impact dimensions was conducted using the BERTopic method, resulting in four key thematic groups: ‘Bibliometric and scientometric analysis in AI and ML research’, ‘Machine learning and artificial intelligence for bibliometric analysis’, ‘Machine learning and artificial intelligence for specific dimensions of research impact’, and ‘Machine learning, artificial intelligence, and advanced methodologies for research impact applications’. These topics were interpreted using multiple labelling strategies, including representation, KeyBERT, MMR, and LLama2, providing a comprehensive understanding of the literature and capturing applications like research trend analysis, altmetrics, and AI-driven citation prediction. Of 1608 documents, 407 were uncategorised (Topic 1), while the remaining 1201 were distributed across 35 distinct topics. This distribution reflects the diversity in AI and ML applications in research impact analysis. The detailed outcomes of the topic modelling, including the categorised topics and their scholarly interpretations, are presented in Appendix A, Table A1. Additionally, the visualisations of the topic modelling result, including the distribution of documents, similarity matrix, word scores, and inter-topic distance map, are provided in Appendix A, Figure A1. The following sections provide a detailed exploration of the key topics identified within each thematic group.

Machine Learning and Artificial Intelligence for Bibliometric Analysis: The results from the BERTopic modelling for the ‘Machine learning and artificial intelligence for bibliometric analysis’ group present a compelling analysis of how AI and machine learning are applied to bibliometric research and research impact assessment. This group encompasses two topics that emphasise the integration of AI into traditional bibliometric methods, enhancing the understanding of research impact, collaboration, and citation dynamics.

Topic 14: ML/AI-driven bibliometrics and research impact assessment use machine learning and artificial intelligence to improve bibliometric indicators’ accuracy and predictive power. Key terms such as ‘citation’, ‘research’, and ‘scientometrics’ suggest that AI is being used to predict the impact of highly cited papers and assess research productivity. Representative documents on this topic illustrate how machine learning algorithms, such as support vector machines (SVMs) and random forests, predict research papers’ quality and citation potential. The focus is enhancing traditional bibliometric techniques, such as citation count analysis and peer review, through AI-driven models, ultimately leading to more efficient and accurate research impact evaluations. The documents provide examples of AI’s utility in identifying ‘sleeping beauties’—papers that go unnoticed for years before becoming highly influential—and mitigating biases in citation analysis.

Topic 15: ML/AI-enhanced bibliometric analysis and publication impact evaluation delves into the application of artificial intelligence to evaluate citation impact, scholarly collaboration, and institutional productivity. Terms like ‘h-index’, ‘collaboration’, and ‘PageRank’ highlight the role of AI in assessing academic performance through advanced metrics. The representative documents explore how AI models rank journals and institutions, providing more nuanced evaluations of scholarly contributions. For instance, AI-enhanced citation analysis tools such as PageRank and the h-index are employed to assess the relative importance of researchers and their institutions within global scholarly networks. Furthermore, the documents examine the use of AI to evaluate international collaboration patterns and to predict future trends in citation and publication impact, reinforcing AI’s role in driving innovation in bibliometric analysis.

These topics demonstrate the growing reliance on AI and machine learning in bibliometrics. By leveraging advanced algorithms, researchers can now achieve more granular and predictive insights into research impact, enabling more comprehensive assessments of academic performance and collaboration. Integrating AI into bibliometric analysis transforms how scholarly impact is measured, making it more robust, data-driven, and scalable.

Machine Learning and Artificial Intelligence for Specific Dimensions of Research Impact: The ‘Machine learning and artificial intelligence for specific dimensions of research impact’ group highlights how AI and machine learning techniques evaluate research performance, predict citation impact, and enhance scholarly collaboration. The topics within this group focus on various aspects of research evaluation, citation analysis, and impact prediction, showcasing the broad applications of machine learning in these areas. These topics demonstrate AI and machine learning’s transformative role in advancing bibliometric analysis and research impact assessment. By utilising advanced algorithms, these models introduce innovative methods for evaluating research productivity, predicting citation trends, and fostering collaboration, thereby enriching the understanding of research impact across multiple dimensions.

Topic 5: ML/AI-driven evaluation metrics and models for research performance and innovation assessment explores machine learning models, such as neural networks and support vector machines (SVMs), for evaluating research projects and predicting research output. These models assist in risk analysis, project performance assessment, and innovation management, utilising advanced techniques like wavelet analysis, manifold learning, and sparse representation. The topic emphasises the importance of AI-driven methods in improving research evaluation processes and innovation impact assessments by providing more accurate predictive models.

Topic 6: Sentiment and citation impact analysis through machine learning techniques focuses on applying sentiment analysis and citation impact measurement using natural language processing (NLP) and ML algorithms. This topic covers various approaches to sentiment extraction from scientific articles, analysing citation polarity and in-text citation patterns to evaluate research impact. Techniques such as sentiment classification and feature extraction help predict the influence of academic work, highlighting the role of sentiment-driven citation analysis in understanding research dynamics and scholarly communication.

Topic 8: Impact and innovation in bibliographic data extraction and analysis covers the use of machine learning for bibliographic disambiguation and author identification in digital libraries. Neural networks, natural language processing (NLP), and support vector machines (SVMs) are utilised to disambiguate author names and classify bibliographic records, improving the accuracy of metadata extraction and citation linking. This topic emphasises the critical role of machine learning in automating bibliographic data processing, enhancing the quality of academic databases, and supporting more precise citation analyses.

Topic 12: Predictive modelling of citation impact and research metrics delves into applying deep learning models for predicting citation counts and research impact. Techniques such as convolutional neural networks (CNNs) and BERT-based models are used to forecast citation trends and evaluate scholarly contributions. This topic demonstrates how machine learning can enhance traditional scientometric analysis by providing robust, data-driven predictions of future citation patterns, thus aiding research evaluation and performance assessment.

Topic 13: ML/AI in research evaluation and methodology highlights the application of machine learning techniques in evaluating qualitative and quantitative research, particularly in areas like policing and social work. By leveraging tools such as NLP and crowdsourcing, AI methodologies enhance the evaluation of complex social issues, including domestic violence and community policing strategies. This topic emphasises the value of AI in improving research methodologies and enhancing the reliability and scope of qualitative research through automation and classification models.

Topic 23: Advanced techniques in bibliometric and scientometric analysis for understanding research impact and collaboration delves into gender and diversity issues in scholarly research through the lens of advanced scientometric techniques. The analysis uncovers persistent gender biases in various fields and evaluates how gender differences impact research performance, citation patterns, and collaboration dynamics. Leveraging techniques such as topic modelling, BERT-based sentiment analysis, and co-citation analysis, this topic provides insights into how social factors influence research impact, productivity, and recognition, focusing on understanding disparities in academic environments. The findings emphasise the importance of diversity and inclusion in research impact analysis, revealing the need for addressing gender inequalities in scholarly communication.

Topic 30: Impact of deep learning and machine learning on citation analysis and recommender systems explores the intersection of citation analysis and personalised recommendation systems in academic research. Machine learning models recommend scholarly papers based on citation context, author profiles, and network-based algorithms like PageRank. This topic underscores the importance of ML in refining information retrieval systems and enhancing academic discovery through recommender systems that personalise research outputs for individual scholars.

Topic 31: Impact of deep learning and machine learning on research collaboration prediction and network analysis focuses on predicting research collaboration through network analysis and link prediction models. Deep learning techniques, such as autoencoders and graph-based models, are employed to forecast co-authorship patterns and academic collaborations. The topic highlights how ML models can identify potential collaborations, analyse network centrality, and predict future research trends, thereby fostering a deeper understanding of academic networks and their impact on research productivity.

Topic 33: Impact of proximity dimensions and topic modelling on international research collaboration focuses on how geographical and disciplinary proximity influences international research collaboration. The analysis explores the dynamics of collaboration networks, explicitly addressing the role of factors such as occupational mobility, proximity, and topic modelling in shaping research patterns across borders. The topic highlights the application of Bayesian networks and social network analysis to examine how proximity dimensions affect research productivity and collaboration outcomes, focusing on fields such as healthcare and journalism. This demonstrates the role of ML in analysing complex, multidisciplinary research networks, enabling a deeper understanding of how proximity and integration influence collaboration and research output.

Topic 34: Machine learning and statistical methods for research evaluation and performance assessment explores machine learning models, such as support vector machines (SVMs) and parallel computing in evaluating research performance across various domains. This topic emphasises the application of ML in automating research evaluation processes, particularly in large-data environments, such as vocational colleges and scientific projects. Techniques like kernel methods and Fourier analysis are applied to assess performance metrics and improve evaluation accuracy. The integration of ML in performance assessment underscores its capacity to enhance the scalability and precision of research evaluation, providing more comprehensive insights into the factors driving research excellence and impact.

Machine Learning, Artificial Intelligence, and Advanced Methodologies for Research Impact Applications: The group ‘Machine learning, artificial intelligence, and advanced methodologies for research impact applications’ examines how AI and ML are applied across fields like education, environmental sustainability, and emerging technologies to enhance research evaluation and impact analysis. The topics illustrate how AI- and ML-driven approaches contribute to research assessment, sustainability goals, and ethical practices. This analysis highlights the pivotal role of AI and ML in advancing research evaluation, ethics, and sustainability across diverse fields, as demonstrated by the topical outcomes of the BERTopic model.

Topic 7: Impact and evaluation of online learning through bibliometric and scientometric analysis focuses on the rise of online and distance learning, particularly during the COVID-19 pandemic. Applying scientometric techniques and machine learning in this domain highlights trends in online education, student engagement, and the effectiveness of massive open online courses (MOOCs). Bibliometric analysis is used to assess the impact of online learning on academic success and retention, offering insights into how digital education is shaping the future of higher education.

Topic 17: Bibliometric and AI-enhanced analysis of sustainable development and technological innovations explores the intersection of AI and sustainable development, particularly in achieving the United Nations’ Sustainable Development Goals (SDGs). The analysis showcases how machine learning models, combined with bibliometric techniques, assess sustainability efforts, corporate responsibility, and environmental innovations. Researchers can map out technological innovations and their contributions to sustainability goals by leveraging topic modelling and path analysis, providing insights into waste management, renewable energy, and climate change mitigation.

Topic 19: Impact of active learning and assessment strategies on educational outcomes examines the role of active learning and innovative assessment strategies in improving educational outcomes. It emphasises using machine learning models to evaluate teaching methods, such as case-based learning and simulation exercises. The topic demonstrates how AI can enhance curriculum development and student assessment, particularly in STEM disciplines and professional education programs, fostering more effective learning environments.

Topic 21: Impact of online and digital learning innovations on professional development and educational outcomes explores the role of digital learning innovations in professional education, particularly in the health and medical fields. The analysis emphasises how machine learning-enhanced online courses, such as MOOCs and virtual classrooms, are used to train healthcare professionals. These tools are instrumental in improving education outcomes in areas such as nursing, medical education, and counsellor training, further highlighting the role of AI in supporting continuous professional development in health care.

Topic 24: Advanced evaluation models and algorithms in educational, environmental, and market research demonstrates the significant role of advanced machine learning algorithms, such as support vector machines (SVMs), k-means clustering, and gradient boosting, in research evaluation across various fields. This topic highlights machine learning techniques for segmentation, clustering, and evaluation in educational platforms, business markets, and environmental contexts. The application of these models enables more accurate decision-making processes, such as evaluating the price of second-hand commodities and improving teaching quality assessments, illustrating the practical impact of machine learning on real-world applications.

Topic 27: Bibliometric and scientometric analysis of ethical and responsible research in AI focuses on the critical theme of ethical AI, emphasising the need for responsible and explainable AI (XAI). This topic covers the ethical challenges associated with AI research, including fairness, transparency, and bias in machine learning systems. Using bibliometric methods to track the evolution of ethics considerations in AI research provides valuable insights into the global discourse on AI ethics, helping guide future research toward more accountable and transparent AI systems.

Topic 28: Bibliometric and scientometric analysis of emerging technologies in cryptocurrency and blockchain explores the impact of emerging technologies, such as blockchain, the Internet of Things (IoT), and cryptocurrencies, through a machine learning and scientometric lens. This topic highlights how AI and ML are being applied to improve security, privacy, and decision-making in blockchain applications, 5G networks, and cloud computing. Researchers can track advancements in these technologies and their influence on finance, supply chains, and cyber-physical systems by leveraging bibliometric analyses.

Topic 32: Impact of active learning and educational innovations on student engagement and learning outcomes highlights the transformative impact of active learning methodologies, such as flipped classrooms, on student engagement and learning outcomes. Machine learning and bibliometric analyses are employed to evaluate the effectiveness of these pedagogical approaches, particularly in higher-education settings. The topic underscores how active learning fosters interdisciplinary collaboration and improves academic performance, with bibliometric methods helping to map the scientific contributions to this educational innovation.

Bibliometric and Scientometric Analysis of ML and AI in Research Impact: Many studies have employed bibliometric and scientometric analysis to assess the use of machine learning and artificial intelligence in research evaluation, trend analysis, performance measurement, and impact assessment. This topic group provides an in-depth exploration of various applications of bibliometrics and scientometrics, enhanced by AI and ML, to evaluate research impact and identify emerging trends. The section covers several key topics, offering valuable insights into how AI and ML transform research evaluation and trend prediction across various disciplines.

Topic 0: Topic modelling and research trend analysis in scholarly literature emphasises the role of topic modelling, such as latent Dirichlet allocation (LDA), in uncovering trends and patterns within scholarly publications. This topic highlights the application of natural language processing (NLP) and text mining techniques to analyse vast datasets, identifying emerging research fronts and key scientific trends. By integrating bibliometrics and topic modelling, researchers can predict the evolution of scientific fields, aiding in the strategic decision-making process for academic institutions and policymakers.

Topic 1: Research evaluation using altmetrics and citation analysis in scholarly communication focuses on the increasing importance of altmetrics as a complement to traditional citation-based metrics. This topic reveals how altmetrics derived from social media platforms like X can influence research impact evaluation through ML techniques, such as sentiment analysis and predictive modelling. Integrating altmetrics with citation analysis provides a more comprehensive picture of a publication’s influence, particularly in the digital age, where online engagement is a significant metric of scholarly communication.

Topic 2: Bibliometric analysis of artificial intelligence research underscores the use of bibliometric techniques to analyse AI research, with a particular focus on language models, such as ChatGPT with GPT-3, GPT-3.5 and GPT-4, and their applications in various domains. This topic explores the intersection of AI and education, highlighting the role of automated systems, such as intelligent tutoring systems, in shaping educational practices. Integrating bibliometric analysis with AI enables the identification of key trends and challenges in AI research, providing valuable insights into the future trajectory of the field.

Topic 3: Bibliometric analysis and research evaluation of metaheuristic algorithms delves into applying optimisation algorithms in research evaluation, such as particle swarm optimisation (PSO) and genetic algorithms. The bibliometric analysis highlights how these metaheuristic methods are employed in various fields, focusing on their role in solving complex, multi-objective optimisation problems. The results reveal trends in algorithm development and their practical implications in research impact assessment, underlining the relevance of such algorithms in enhancing research reliability and performance evaluation.

Topic 4: Causal inference in research evaluation and bibliometric impact analysis provides insights into how causal inference methods, such as Bayesian networks and propensity score matching, are applied to bibliometric studies to infer relationships between research outcomes and their impact. The topic emphasises the importance of causal methods in handling bias and assessing the effectiveness of research policies, particularly in observational studies. It also highlights the growing application of these techniques in clinical research evaluation and citation impact analysis, providing a robust framework for assessing the true influence of research findings.

Topic 9: Bibliometric and scientometric analysis of artificial intelligence in education explores the impact of AI in the education sector through bibliometric studies. This topic provides insights into how AI transforms teaching, learning, and academic integrity by mapping trends and analysing AI tools such as ChatGPT in higher education. The systematic review of AI in educational contexts highlights the emerging challenges and opportunities in integrating AI-driven technologies into academic settings, underscoring its role in enhancing educational outcomes.

Topic 10: Research impact through bibliometric, scientometric, and multivariate analysis presents an advanced application of bibliometrics and multivariate techniques for assessing research performance. This topic delves into the use of principal component analysis and regression models to evaluate institutional research output, providing a quantitative framework for determining scientific productivity. By leveraging ML methods, this topic enhances the precision of research impact assessments, facilitating more informed decisions in research management and policy formulation.

Topic 11: Bibliometric and information retrieval impact analysis focuses on the intersection of information retrieval systems and bibliometric analysis, particularly in the medical domain. The topic emphasises using machine learning algorithms to improve the precision and recall of bibliographic data extraction, with applications in PubMed, Medline, and systematic reviews. This analysis highlights how advanced retrieval techniques, such as semantic search and deep learning models, are revolutionising the accuracy and efficiency of bibliographic searches in medical research, ultimately enhancing citation analysis and research evaluation.

Topic 16: Bibliometric and scientometric analysis for research trends and impact assessment investigates the digital transformation in business management through bibliometric analysis. This topic showcases the use of scientometric methods to analyse trends in digitalisation, innovation, and entrepreneurship. By applying topic modelling techniques, this topic highlights the evolving nature of digital transformation and its impact on business practices, offering a rich perspective on integrating AI into corporate strategies and industrial systems.

Topic 18: Bibliometric and scientometric analysis of artificial intelligence research trends and impact focuses on mapping the research landscape of AI through bibliometric and scientometric techniques. This topic examines the adoption of AI across multiple disciplines, including physical, social, and life sciences, providing a comprehensive view of AI’s scientific impact. The integration of ML-based tools, such as VOSviewer and LDA, identifies key research domains and visualises intellectual networks, offering a data-driven approach to understanding AI’s transformative role across sectors.

Topic 20: Bibliometric and machine learning approaches to analysing research trends and impact in artificial intelligence investigates integrating neural networks and machine learning techniques into bibliometric analysis, particularly for categorisation and trend analysis. The topic covers the use of neural networks for document classification, citation analysis, and research impact prediction, underscoring the utility of ML in improving the accuracy and depth of bibliometric evaluations. This approach enhances the ability to track research trends across domains such as biomedicine, supply chain management, and environmental science.

Topic 22: Bibliometric and scientometric trends in ai and ml across diverse domains focuses on AI and ML applications in specialised areas such as autism spectrum disorder (ASD) and rehabilitation robotics. The bibliometric analysis identifies key trends in using AI to address complex medical challenges, offering insights into research collaborations and scientific advancements in this field. The systematic review of ASD research through AI techniques highlights the potential of ML in improving diagnosis and treatment strategies, emphasising the broader implications of AI’s role in addressing critical healthcare issues.

Topic 25: Bibliometric and scientometric analysis of artificial intelligence research trends and applications focuses on AI’s application across diverse sectors, including machine learning-based visualisation techniques and failure detection. This topic provides a comprehensive view of AI-driven bibliometric analysis, with applications ranging from facial recognition systems to cumulative damage modelling in materials science. By mapping the research landscape using tools like VOSviewer, this analysis uncovers the intricate network of subfields and their convergence in AI research, emphasising how these methods support the prediction of research trends and enhance bibliometric evaluations.

Topic 26: Bibliometric and scientometric analysis of machine learning and educational technology highlights the critical role of machine learning in shaping educational technology research. Through bibliometric techniques and advanced topic modelling, this topic explores the evolution of learning analytics, educational technologies, and the integration of structural topic modelling for trend discovery. The analysis covers quantitative aspects of research collaboration and thematic trends in educational technology. The application of machine learning models, such as TinyML, is also noted for its impact on academic research, particularly in learning-based systems, collaboration, and institutional development.

Topic 29: Bibliometric analysis of deep learning and machine learning technologies in various domains sheds light on applying deep learning and ML techniques across different sectors, including healthcare, fraud detection, and chronic disease management. This topic highlights the use of convolutional neural networks (CNNs) and long short-term memory (LSTM) models in bibliometric analysis to map the impact of ML technologies. It emphasises the role of these models in identifying emerging research areas and predicting future trends, offering a comprehensive perspective on how ML and bibliometric analysis are intertwined to enhance research impact assessments.

5. Discussion

5.1. Overview of the Key Findings

This study addresses the overarching research question ‘How can data science, particularly machine learning and artificial intelligence, be used to provide dynamic insights into research impact by uncovering emerging themes, interdisciplinary connections, and leading entities within research impact dimensions?’ Specifically, it responds to sub-questions that explore the current status of ML/AI in research impact assessment, identify prominent methodologies and emerging themes, and propose directions for future research. The findings presented here mark a significant evolution in research impact evaluation, transitioning from reliance on traditional academic metrics such as citation counts, h-index, and journal impact factors to more dynamic, data-driven methodologies [1,2]. While conventional methods have effectively captured academic influence, they need to address the broader societal, policy, and economic dimensions of research contributions [3].

Machine learning (ML) and artificial intelligence (AI) techniques have emerged as pivotal tools in reshaping research impact evaluation. This study identifies a notable increase in the adoption of ML/AI methodologies, with significant growth in relevant publications from 2010 to 2024 and a peak in 2022. This reflects heightened academic and institutional interest in leveraging these technologies for more comprehensive evaluations. The geographical distribution of contributions highlights China’s leadership in this domain, with the United States and other nations also demonstrating strong engagement. This trend underscores a global shift toward data-driven evaluation practices driven by strategic research investments in ML/AI methodologies.

Key tools and methodologies, including CiteSpace, VOSviewer, and BERTopic, are revolutionising traditional bibliometric practices. These tools provide dynamic insights by mapping co-authorship networks, identifying citation bursts, and uncovering thematic trends. For example, BERTopic leverages transformer-based embeddings to classify emerging themes and interdisciplinary connections, surpassing the capabilities of earlier topic modelling techniques [6]. Such advancements align directly with the sub-question ‘What are the prominent methodologies and emerging research themes in applying machine learning and artificial intelligence to research impact assessment?’ Similarly, CiteSpace’s citation burst analyses offer temporal insights, helping to pinpoint pivotal research moments and further demonstrating how ML/AI enables nuanced and actionable evaluations of research impact.

These findings also address the sub-question ‘What is the current status of research impact evaluation using machine learning and artificial intelligence?’ by highlighting how these technologies are reshaping the landscape of research impact science. Predictive computational models, such as neural networks and decision trees, quantify future citation trajectories to identify emerging trends in scholarly influence. Concurrently, natural language processing (NLP)—particularly sentiment analysis—decodes contextual nuances in academic discourse, enabling granular assessment of how research is received, debated, and applied within scholarly communities. These innovations contribute to a forward-looking approach to research impact evaluation, emphasising the transition from static bibliometric metrics to dynamic, predictive frameworks.

5.2. Prominent Methodologies and Emerging Research Themes in Machine Learning and Artificial Intelligence for Research Impact Assessment

This study also delves into the sub-question “What are the prominent methodologies and emerging research themes in applying machine learning and artificial intelligence to research impact assessment?” Bibliometric tools such as VOSviewer, CiteSpace, and Gephi form the foundation of research impact analysis, enhanced by ML/AI algorithms. These tools enable the visualisation of collaborative networks, identifying influential contributors and uncovering hidden patterns within scholarly data [15]. Predictive modelling has advanced research impact assessment by incorporating publication venues, co-authorship dynamics, and early citation counts to forecast trends accurately [112,113].

Advanced topic modelling techniques, including BERTopic, represent significant methodological innovations. By leveraging transformer-based embeddings, BERTopic offers a granular exploration of emerging themes, addressing critical gaps in traditional topic modelling approaches [6]. For instance, themes such as equity in research collaborations and sustainability have emerged as central focus areas, reflecting evolving priorities in research impact evaluation [10]. AI-driven altmetrics further expand the scope of impact assessment by capturing real-time societal engagement, offering a dynamic understanding of research visibility and relevance [114].

Ethics considerations are another emerging theme, with fairness-aware AI models becoming critical to ensuring equity and transparency in research evaluations [115]. This aligns with the growing emphasis on inclusivity, interdisciplinary collaboration, and sustainability as critical areas for research impact assessment. These themes reflect global challenges such as climate change and public health, but also underscore the transformative potential of ML/AI in addressing these issues through data-driven insights [116].

Finally, integrating ML/AI into qualitative and mixed-method approaches addresses the limitations of traditional quantitative metrics. Combining AI-driven text mining and sentiment analysis with stakeholder interviews provides a more comprehensive understanding of research influence, aligning evaluations with societal priorities and ethical considerations [2]. These advancements represent a paradigm shift in research impact evaluation. ML/AI methodologies enable the dynamic mapping of research ecosystems, uncovering pathways through which research contributes to academic, societal, and policy-level outcomes.

5.3. Future Directions for Integrating Advanced Computational Techniques into Multidimensional Research Impact Evaluation

Applying machine learning (ML) and artificial intelligence (AI) has catalysed transformative advancements in research impact evaluation, yet significant opportunities remain to refine and expand these methodologies. Future directions in this field emphasise the need for adaptable, dynamic, and inclusive frameworks that address the multifaceted nature of research contributions while leveraging advanced computational techniques.

One critical avenue for advancement is the integration of societal, economic, and policy dimensions into existing evaluation frameworks. Current methods often prioritise academic outputs, but emerging computational models, such as dynamic graph-based analyses and reinforcement learning, could enable the nuanced tracking and prediction of interdisciplinary knowledge flows, institutional collaborations, and policy influences [115,117]. For instance, these models could visualise the progression of research contributions within domains like public health or environmental sustainability, bridging the gap between academic metrics and societal outcomes.

Predictive analytics holds particular promise in enhancing the forward-looking capacity of research evaluation. Techniques such as neural networks, ensemble models, and decision trees could incorporate co-authorship patterns, publication venues, and early citation dynamics to project future academic influence [113,118]. To overcome limitations inherent in traditional bibliometric datasets, integrating alternative data sources—including policy documents, patents, and real-time social media discourse—through natural language processing (NLP) could provide richer, context-aware insights into research trajectories. Such frameworks would enable stakeholders to anticipate and respond proactively to emerging trends and disruptions in scholarly ecosystems.

The development of real-time impact monitoring systems represents another transformative direction. By leveraging AI-driven streaming analytics, these systems could continuously track and evaluate research contributions as they unfold across academic and non-academic domains. Tools like CiteSpace and VOSviewer could be adapted to ingest real-time data, enabling dynamic analyses of citation bursts, collaborative networks, and thematic shifts. These capabilities would empower academic institutions, policymakers, and funding bodies to make timely, data-driven resource allocation and research strategy decisions.

A pressing challenge for future research is addressing systemic biases embedded in computational models. Existing frameworks often disproportionately favour well-resourced institutions, English-language publications, and dominant disciplines, perpetuating global inequities in research recognition [10,15]. Fairness-aware algorithms trained on multilingual and diverse datasets are essential for mitigating these biases. Such models could amplify contributions from underrepresented regions and disciplines, fostering a more equitable and comprehensive evaluation of global research impact. For example, regionally tailored models could highlight the unique contributions of researchers in low-resource settings, promoting inclusivity in the evaluation process.

Interdisciplinary research presents a fertile ground for methodological innovation. Complex societal challenges, such as climate change and global health crises, demand insights that transcend disciplinary silos. Advanced computational tools, such as graph embeddings and cross-disciplinary network analyses, could map the synergies among diverse research fields, offering actionable insights to funding agencies and institutions [119]. These models could, for instance, identify how collaborations between AI researchers and public health experts catalyse innovations in healthcare delivery or disease prevention.

Ethics considerations must remain central to the evolution of AI-driven research impact evaluation. Developing transparent, accountable, and inclusive frameworks is vital for maintaining trust among researchers, institutions, and policymakers [115,117]. Future research should focus on establishing rigorous protocols for validating algorithmic outputs, ensuring stakeholder engagement, and mitigating unintended consequences, such as overemphasising short-term metrics. These efforts would align computational practices with the broader values of equity, inclusivity, and accountability.

Finally, mixed-method approaches integrating qualitative insights with quantitative analyses could provide a more holistic understanding of research impact. For instance, combining sentiment analysis of public discourse with stakeholder interviews could uncover the broader societal resonance of academic contributions [2,120]. This integration would address the limitations of purely numeric metrics, capturing nuanced dimensions such as shifts in public perception or the influence of academic work on policy discourse. Overall, the future of research impact evaluation lies in developing adaptable, context-aware systems that leverage the full potential of advanced computational techniques. The field can transition toward more inclusive and dynamic evaluation frameworks by integrating diverse data sources, addressing systemic biases, and prioritising ethical practices. These advancements will ensure that research contributions are assessed and valued in ways that reflect their multifaceted and evolving influence across academic, societal, and policy domains.

5.4. Implications and Limitations of the Study

The findings of this study have significant implications for academic institutions, research organisations, and policymakers by offering a transition from static research evaluations to dynamic, predictive frameworks. For universities and research institutions, integrating machine learning (ML) and artificial intelligence (AI) into bibliometric analysis enables trend forecasting, identification of high-impact research fields, and optimisation of resource allocation. AI-powered insights from co-authorship and citation networks allow institutions to strengthen interdisciplinary collaborations and enhance institutional research impact strategically [19,121].

ML-driven models provide early indicators of research impact for policymakers and funding agencies, guiding evidence-based funding decisions and policy development. By aligning research investment with societal and economic priorities, these predictive tools support efficient resource distribution while ensuring that emerging, high-potential research areas receive adequate support [10,13]. The ability of AI models to analyse evolving research dynamics enhances long-term strategic planning, fostering a more adaptive and forward-looking research ecosystem.

Methodologically, this study demonstrates the versatility of advanced bibliometric tools, including BERTopic, CiteSpace, Bibliometrix, VOSviewer, and Gephi, in analysing citation trends, collaboration structures, and thematic evolution. BERTopic’s BERT-based embeddings and HDBSCAN clustering enable granular topic modelling, while CiteSpace’s citation burst analysis provides insights into rapidly growing research areas. These tools, when combined, offer a robust analytical framework for examining the intersection of AI/ML with research impact assessment.

However, despite its methodological rigor, this study has several limitations. First, the exclusive reliance on the Web of Science Core Collection introduces selection bias, potentially underrepresenting non-English publications, regional research output, and work from less prominent institutions. Expanding future analyses to Scopus, PubMed, and Google Scholar would improve coverage and mitigate database-driven biases. Second, the study focuses solely on English-language publications, excluding valuable contributions published in other languages, which may distort the global representation of research impact.

The use of ML-based topic modelling introduces another set of biases. The BERT-based embeddings used in BERTopic are influenced by their training datasets, potentially amplifying dominant research themes while underrepresenting niche or emerging areas. Additionally, HDBSCAN clustering and UMAP dimensionality reduction may overlook subtle variations in research topics, impacting the granularity of theme identification. While AI-driven bibliometric tools enhance analytical efficiency, they should not be used in isolation but complemented by expert-driven qualitative analysis for balanced insights.

From an ethical perspective, AI-based impact assessments raise concerns regarding algorithmic bias, transparency, and data ethics. Citation-based metrics inherently favour well-established researchers and institutions, potentially reinforcing existing inequalities in academic visibility. Furthermore, the black-box nature of ML algorithms poses challenges in ensuring transparency and interpretability. To mitigate these risks, it is essential to incorporate explainable AI (XAI) frameworks, conduct regular audits of model outputs, and ensure that evaluative frameworks consider qualitative dimensions alongside quantitative citation data.

Finally, this study is temporally constrained, analysing publications up to August 2024, potentially missing emerging paradigm shifts in ML and AI applications for research impact assessment. Given the rapid evolution of AI technologies, future studies should explore longitudinal approaches, cross-disciplinary citation behaviours, and alternative impact indicators beyond bibliometric measures to develop a more holistic research evaluation framework.

6. Conclusions

This study examined the transformative role of machine learning (ML) and artificial intelligence (AI) in research impact evaluation, focusing on the current status, methodologies, and future directions of these advanced computational techniques. Through sophisticated bibliometric methods—including co-authorship analysis, network visualisation, and citation analysis—and advanced topic modelling, the research illuminated critical insights into emerging trends, influential contributors, and evolving frameworks. The findings underscore the potential of ML and AI to revolutionise traditional metrics by enabling nuanced analyses, such as predicting future research directions, identifying interdisciplinary contributions, and uncovering latent patterns, thereby advancing a multidimensional approach to evaluating research impact.

Initially, the study explored the current state of research impact evaluation, revealing a change in thinking from static bibliometric methods to dynamic ML/AI-driven frameworks. This evolution is characterised by integrating computational and predictive methodologies that offer more profound and comprehensive insights into scholarly influence. Tools like CiteSpace and BERTopic were pivotal in identifying temporal trends, uncovering citation bursts, and mapping emerging themes. CiteSpace’s citation burst analysis highlighted pivotal moments in research trajectories, while BERTopic’s transformer-based embeddings provided granular insights into evolving thematic connections and emerging research areas. Furthermore, the study revealed the global adoption of ML/AI methodologies, with significant contributions from China and the United States collectively accounting for over 40% of the research output. Institutions such as Wuhan University and the Education University of Hong Kong were identified as leaders in driving innovation, demonstrating the importance of strategic regional investments and institutional leadership in advancing data-driven research impact assessment.

The second focus of the study examined prominent methodologies and emerging research themes in applying ML and AI to research impact assessment. Predictive modelling emerged as a transformative approach, enabling accurate forecasts of citation trajectories and scholarly trends. Techniques such as decision trees, neural networks, and ensemble methods integrated diverse variables—including publication venues, early citation counts, and co-authorship dynamics—to provide robust predictions of future academic influence. Natural language processing (NLP) techniques, including sentiment analysis, added a qualitative dimension by analysing the tone, polarity, and contextual relevance of citations and publications. These methodologies underscored the potential of ML and AI to enrich research impact evaluations with predictive, qualitative, and data-driven innovations. Emerging themes included the growing prominence of altmetrics as complementary indicators, the prioritisation of equity and inclusivity in research evaluations, and the integration of AI in addressing global challenges such as sustainability and public health.

Building on these findings, the study identified future directions for advancing research impact evaluation through computational techniques. The development of frameworks integrating societal, economic, and policy dimensions with traditional academic metrics was highlighted as essential for addressing the multifaceted nature of research contributions. Fairness-aware algorithms were critical for mitigating systemic biases and ensuring equitable recognition of underrepresented regions and disciplines. Including unconventional data sources—such as policy documents, patents, and social media analytics—was vital for enriching contextual understanding and societal resonance. Additionally, the study advocated for real-time impact monitoring systems powered by streaming analytics and AI to enable continuous, dynamic evaluations across academic and non-academic domains. Ethics considerations were also prioritised, with calls for transparent, inclusive, and accountable methodologies that align computational practices with global priorities and stakeholder needs. These forward-looking approaches aim to transform research impact evaluation into a more adaptive, inclusive, and actionable process capable of addressing the complexities of global research ecosystems.

Overall, the study provides a comprehensive understanding of the evolving landscape of research impact evaluation. Bibliometric tools such as VOSviewer and Gephi offered detailed mappings of interactions among scholars, institutions, and countries, while BERTopic modelling revealed nuanced thematic patterns and interdisciplinary applications. Collectively, these methodologies highlighted the dynamic evolution of research contributions and provided actionable insights for optimising resource allocation, fostering collaborations, and enhancing the societal relevance of research outputs.

From a practical standpoint, the findings of this study offer valuable insights for academic institutions, policymakers, and funding agencies. Integrating AI and ML in research evaluation enables real-time impact assessment, allowing institutions to identify high-impact research fields, optimise funding distribution, and strengthen international collaborations. Policymakers can leverage ML-driven citation predictions and network analyses to inform evidence-based policy decisions, ensuring that research investments align with societal and economic priorities. Furthermore, AI-enhanced bibliometric tools provide a data-driven framework to assess interdisciplinary collaborations, helping institutions refine strategic research agendas and encouraging cross-disciplinary innovation.

Despite these advancements, several limitations must be acknowledged. The reliance on the Web of Science Core Collection and the exclusion of non-English literature introduce potential biases that may limit the generalizability of the findings. Future studies should consider incorporating broader databases such as Scopus, Google Scholar, and PubMed to ensure a more inclusive representation of global research impact. Additionally, inherent biases in ML models and bibliometric tools, such as BERTopic and CiteSpace, may influence the granularity and inclusivity of results, potentially overrepresenting dominant themes while underrepresenting emerging or niche research areas. Addressing these biases requires continuous improvements in algorithmic transparency and regular validation against expert-driven evaluations to enhance interpretability and fairness in research impact assessments.

The study’s temporal scope is another limitation, as it only considers research published to August 2024, potentially missing the latest trends and paradigm shifts in ML and AI applications for research impact evaluation. The rapid evolution of AI technologies necessitates ongoing research incorporating longitudinal approaches, real-time data updates, and alternative impact indicators beyond citation metrics. Future studies should explore how AI can assess broader dimensions of research impact, such as policy influence, societal engagement, and technological innovation, to develop a more holistic and equitable research evaluation framework.

In conclusion, the study makes significant contributions to the scientometric and research impact fields, setting a precedent for future investigations. The bibliometric analysis provided a comprehensive view of the research ecosystem by identifying key contributors, influential institutions, and collaboration patterns across countries and publications. The BERTopic modelling extended these insights by uncovering emerging thematic trends, highlighting the dynamic evolution of research topics, and predicting future directions in interdisciplinary and AI-driven research. These findings emphasise the necessity of continual innovation and adaptation in research methodologies to keep pace with rapid technological advancements and the global research landscape. Furthermore, the implications drawn from the analysis provide actionable insights for institutions, researchers, and policymakers, empowering them to enhance research impact evaluation and foster strategic collaborations. Despite the identified limitations, such as data curation challenges and potential algorithmic biases, the study’s contributions underscore the transformative potential of ML and AI in shaping the future of research evaluation.

Author Contributions

Conceptualisation, M.H.A., O.M., A.A.M. and A.J.H.; methodology, M.H.A., O.M., A.A.M. and I.A.K.; software, M.H.A. and A.J.H.; validation, M.H.A. and A.J.H.; formal analysis, M.H.A.; investigation, M.H.A.; resources, M.H.A.; data curation, M.H.A., I.A.K. and A.J.H.; writing—original draft preparation, M.H.A.; writing—review and editing, M.H.A., O.M., A.A.M., I.A.K. and A.J.H.; visualisation, M.H.A.; supervision, O.M. and A.A.M.; project administration, M.H.A., O.M. and A.A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting the reported results in this study have been uploaded and are publicly available in the Web of Science Text file format at the following link: https://raw.githubusercontent.com/mharsalan/ResearchImpact/refs/heads/main/WoS_MLAI_for_Research_Impact_8August2024.txt, (accessed on 21 March 2025). No new data were created during this study. The data used in this research are available for public access and can be freely downloaded and analysed.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Topic Modelling Outcomes

Table A1. Topic modelling results and scholarly interpretation.

Scholarly Interpretation		Topic Modelling Result
Group	Interpreted Topic Label	Topic	Count	LLama2 Label	Name	Representation	KeyBERT	MMR
Bibliometric and Scientometric Analysis in AI and ML Research	Topic Modelling and Research Trend Analysis in Scholarly Literature	0	215	Research Impact Analysis and Prediction	0_topic_modeling_influence_analysis	[‘topic’, ‘modeling’, ‘influence’, ‘analysis’, ‘text’, ‘trend’, ‘network’, ‘processing’, ‘social’, ‘mining’, ‘natural’, ‘science’, ‘language’, ‘latent’, ‘evolution’, ‘dirichlet’, ‘allocation’, ‘technology’, ‘scientometrics’, ‘patent’, ‘lda’, ‘bibliometrics’, ‘model’, ‘research’, ‘using’, ‘information’, ‘citation’, ‘emerging’, ‘study’, ‘bibliometric’, ‘knowledge’, ‘scientific’, ‘twitter’, ‘similarity’, ‘dynamic’, ‘clustering’, ‘medium’, ‘detection’, ‘prediction’, ‘collaboration’, ‘approach’, ‘machine’, ‘literature’, ‘data’, ‘publication’, ‘library’, ‘forecasting’, ‘extraction’, ‘identifying’, ‘word’, ‘cocitation’, ‘innovation’, ‘community’, ‘structure’, ‘index’, ‘semantic’, ‘news’, ‘speech’, ‘review’, ‘learning’, ‘interdisciplinarity’, ‘brain’, ‘nlp’, ‘hierarchical’, ‘indicator’, ‘time’, ‘sentiment’, ‘based’, ‘technological’, ‘cluster’, ‘bibliographic’, ‘translation’, ‘path’, ‘framework’, ‘document’, ‘scientometric’, ‘diffusion’, ‘analyzing’, ‘market’, ‘author’, ‘profiling’, ‘informationscience’, ‘convergence’, ‘classification’, ‘graph’, ‘method’, ‘centrality’, ‘series’, ‘event’, ‘detecting’, ‘topical’, ‘trending’, ‘mass’, ‘coword’, ‘core’, ‘management’, ‘hybrid’, ‘impact’, ‘chinese’, ‘feature’]	[‘scientometrics’, ‘bibliometrics’, ‘scientometric’, ‘bibliometric’, ‘informationscience’, ‘latent’, ‘topical’, ‘topic’, ‘centrality’, ‘classification’]	[‘scientometrics’, ‘lda’, ‘bibliometrics’, ‘bibliometric’, ‘interdisciplinarity’, ‘nlp’, ‘scientometric’, ‘informationscience’, ‘centrality’, ‘trending’]
	Research Evaluation Using Altmetrics and Citation Analysis in Scholarly Communication	1	65	Impact Analysis and Prediction in Scholarly Research	1_altmetrics_twitter_sentiment_social	[‘altmetrics’, ‘twitter’, ‘sentiment’, ‘social’, ‘regression’, ‘medium’, ‘linear’, ‘tweet’, ‘emotional’, ‘analysis’, ‘data’, ‘big’, ‘opinion’, ‘multiple’, ‘journal’, ‘evaluation’, ‘impact’, ‘altmetric’, ‘utility’, ‘scholarly’, ‘research’, ‘communication’, ‘article’, ‘influence’, ‘review’, ‘academic’, ‘quality’, ‘based’, ‘government’, ‘satisfaction’, ‘public’, ‘medicine’, ‘perception’, ‘effect’, ‘score’, ‘citation’, ‘machine’, ‘attachment’, ‘microblog’, ‘network’, ‘user’, ‘set’, ‘scholar’, ‘air’, ‘metric’, ‘facebook’, ‘fuzzy’, ‘online’, ‘news’, ‘life’, ‘researcher’, ‘approach’, ‘geographically’, ‘mentioning’, ‘science’, ‘learning’, ‘subjective’, ‘scale’, ‘agreement’, ‘pca’, ‘factor’, ‘ranking’, ‘model’, ‘usage’, ‘excellence’, ‘humanity’, ‘study’, ‘bias’, ‘significance’, ‘google’, ‘predicting’, ‘statistic’, ‘weighted’, ‘time’, ‘book’, ‘influential’, ‘attention’, ‘feature’, ‘mining’, ‘technology’, ‘legal’, ‘correlation’, ‘typology’, ‘dsrs’, ‘scenic’, ‘lightweight’, ‘housing’, ‘specialized’, ‘societal’, ‘retweet’, ‘researchgate’, ‘mention’, ‘neurosexism’, ‘goodreads’, ‘youtube’, ‘timeweighted’, ‘neuraminidase’, ‘psolstm’, ‘revised’, ‘renewal’]	[‘altmetrics’, ‘altmetric’, ‘weighted’, ‘ranking’, ‘scholarly’, ‘influential’, ‘researchgate’, ‘impact’, ‘researcher’, ‘twitter’]	[‘altmetrics’, ‘twitter’, ‘altmetric’, ‘citation’, ‘factor’, ‘ranking’, ‘weighted’, ‘influential’, ‘researchgate’, ‘timeweighted’]
	Bibliometric Analysis of Artificial Intelligence Research	2	54	Artificial Intelligence for Research Impact Analysis	2_language_chatgpt_artificial_chatbot	[‘language’, ‘chatgpt’, ‘artificial’, ‘chatbot’, ‘intelligence’, ‘writing’, ‘generative’, ‘ai’, ‘automated’, ‘education’, ‘english’, ‘bibliometric’, ‘learning’, ‘chatbots’, ‘teacher’, ‘conversational’, ‘assessment’, ‘tutoring’, ‘analysis’, ‘natural’, ‘intelligent’, ‘chemistry’, ‘processing’, ‘essay’, ‘foreignlanguage’, ‘efl’, ‘agent’, ‘bibliometrics’, ‘organic’, ‘large’, ‘research’, ‘emergency’, ‘personalized’, ‘validity’, ‘use’, ‘future’, ‘feedback’, ‘review’, ‘trend’, ‘professional’, ‘literature’, ‘second’, ‘evaluation’, ‘comprehensive’, ‘learner’, ‘comprehension’, ‘technology’, ‘foreign’, ‘exploring’, ‘year’, ‘metaanalysis’, ‘cocitation’, ‘biology’, ‘model’, ‘systematic’, ‘hallucination’, ‘intelligenceassisted’, ‘resuscitation’, ‘secondyear’, ‘technologybased’, ‘metaanalytic’, ‘triage’, ‘council’, ‘learningsystem’, ‘obgyn’, ‘captcha’, ‘ajg’, ‘l2’, ‘aiassisted’, ‘whatsapp’, ‘gpt’, ‘breaking’, ‘biofeedback’, ‘altmetrics’, ‘application’, ‘artificialintelligence’, ‘testing’, ‘congress’, ‘written’, ‘breast’, ‘jcr’, ‘practice’, ‘system’, ‘mechanism’, ‘crosssectional’, ‘chatgpt35’, ‘bot’, ‘survey’, ‘student’, ‘science’, ‘dimensionsai’, ‘viewer’, ‘realtime’, ‘interpretation’, ‘vos’, ‘snip’, ‘reconstruction’, ‘reaction’, ‘teaching’, ‘knowledge’]	[‘bibliometric’, ‘bibliometrics’, ‘altmetrics’, ‘artificialintelligence’, ‘ai’, ‘chatbots’, ‘aiassisted’, ‘chatbot’, ‘survey’, ‘systematic’]	[‘chatgpt’, ‘ai’, ‘chatbots’, ‘assessment’, ‘learningsystem’, ‘captcha’, ‘aiassisted’, ‘altmetrics’, ‘artificialintelligence’, ‘bot’]
	Bibliometric and Scientometric Analysis of Artificial Intelligence in Education	9	43	Artificial Intelligence in Education and Research	9_education_intelligence_artificial_bibliometric	[‘education’, ‘intelligence’, ‘artificial’, ‘bibliometric’, ‘chatgpt’, ‘higher’, ‘educational’, ‘technology’, ‘generative’, ‘ai’, ‘analysis’, ‘nursing’, ‘student’, ‘academic’, ‘research’, ‘review’, ‘literature’, ‘entrepreneurship’, ‘learning’, ‘specialization’, ‘20132023’, ‘personalised’, ‘special’, ‘speech’, ‘plagiarism’, ‘curriculum’, ‘integrity’, ‘analytics’, ‘trend’, ‘interdisciplinary’, ‘future’, ‘entrepreneurial’, ‘environment’, ‘year’, ‘science’, ‘language’, ‘direction’, ‘knowledge’, ‘systematic’, ‘bard’, ‘3dgpt’, ‘iraq’, ‘internationalisation’, ‘competencybased’, ‘dishonesty’, ‘challenge’, ‘bibliometrics’, ‘citespace’, ‘performance’, ‘school’, ‘development’, ‘technologyenhanced’, ‘field’, ‘mapping’, ‘unveiling’, ‘mechanical’, ‘profession’, ‘secondary’, ‘map’, ‘engineering’, ‘africa’, ‘style’, ‘decade’, ‘management’, ‘need’, ‘content’, ‘soft’, ‘study’, ‘institution’, ‘implication’, ‘topic’, ‘application’, ‘website’, ‘tutoring’, ‘pedagogy’, ‘emerging’, ‘natural’, ‘strategic’, ‘capability’, ‘agenda’, ‘literacy’, ‘european’, ‘business’, ‘adaptive’, ‘orientation’, ‘web’, ‘40’, ‘manufacturing’, ‘country’, ‘artificialintelligence’, ‘teaching’, ‘insight’, ‘evolution’, ‘landscape’, ‘online’, ‘perception’, ‘scientific’, ‘covid19’, ‘role’, ‘vosviewer’]	[‘bibliometric’, ‘bibliometrics’, ‘artificialintelligence’, ‘systematic’, ‘education’, ‘ai’, ‘institution’, ‘academic’, ‘knowledge’, ‘educational’]	[‘education’, ‘bibliometric’, ‘chatgpt’, ‘plagiarism’, ‘analytics’, ‘bard’, ‘citespace’, ‘emerging’, ‘artificialintelligence’, ‘online’]
	Research Impact Through Bibliometric, Scientometric, and Multivariate Analysis	10	39	Machine Learning for Research Impact Analysis	10_principal_component_heuristic_bibliometricsbased	[‘principal’, ‘component’, ‘heuristic’, ‘bibliometricsbased’, ‘performance’, ‘evaluation’, ‘research’, ‘regression’, ‘indicator’, ‘analysis’, ‘frugal’, ‘university’, ‘logistic’, ‘hindex’, ‘agricultural’, ‘scientific’, ‘psychological’, ‘index’, ‘website’, ‘science’, ‘decision’, ‘sector’, ‘institute’, ‘researcher’, ‘personal’, ‘rd’, ‘model’, ‘impact’, ‘public’, ‘method’, ‘appraisal’, ‘bibliometrics’, ‘measure’, ‘higher’, ‘output’, ‘multivariate’, ‘development’, ‘innovation’, ‘cost’, ‘using’, ‘institution’, ‘investment’, ‘level’, ‘mmre’, ‘pred’, ‘herzegovina’, ‘revolutionary’, ‘bosnia’, ‘nobel’, ‘estonia’, ‘ordinal’, ‘prizewinning’, ‘coping’, ‘mainland’, ‘journal’, ‘efficiency’, ‘chinese’, ‘production’, ‘activity’, ‘data’, ‘multinomial’, ‘thesis’, ‘restart’, ‘subsidy’, ‘display’, ‘staff’, ‘factor’, ‘ranking’, ‘decisionmaking’, ‘growth’, ‘tree’, ‘judgment’, ‘wellbeing’, ‘department’, ‘envelopment’, ‘lens’, ‘citation’, ‘based’, ‘delphi’, ‘visibility’, ‘confidence’, ‘agriculture’, ‘definition’, ‘bibliometric’, ‘linear’, ‘big’, ‘stress’, ‘coauthor’, ‘individual’, ‘cell’, ‘uncertainty’, ‘policy’, ‘productivity’, ‘choice’, ‘doctoral’, ‘disciplinary’, ‘economic’, ‘stem’, ‘case’, ‘difference’]	[‘bibliometric’, ‘bibliometrics’, ‘bibliometrics-based’, ‘ranking’, ‘citation’, ‘evaluation’, ‘institution’, ‘appraisal’, ‘envelopment’, ‘multivariate’]	[‘heuristic’, ‘bibliometrics-based’, ‘indicator’, ‘institute’, ‘bibliometrics’, ‘multivariate’, ‘ranking’, ‘envelopment’, ‘citation’, ‘bibliometric’]
	Bibliometric and Scientometric Analysis of Machine Learning and Educational Technology	18	26	Artificial Intelligence and Scientific Impact Analysis	18_intelligence_artificial_bibliometric_analysis	[‘intelligence’, ‘artificial’, ‘bibliometric’, ‘analysis’, ‘adoption’, ‘science’, ‘technology’, ‘maritime’, ‘crime’, ‘study’, ‘pattern’, ‘ai’, ‘trend’, ‘management’, ‘humancomputer’, ‘bibliometrics’, ‘system’, ‘vehicle’, ‘thematic’, ‘knowledge’, ‘branding’, ‘unscented’, ‘luxury’, ‘agentbased’, ‘kalman’, ‘autonomous’, ‘ukf’, ‘suspicious’, ‘production’, ‘application’, ‘applied’, ‘artificialintelligence’, ‘filter’, ‘mapping’, ‘scientometric’, ‘transformerbased’, ‘research’, ‘basic’, ‘union’, ‘property’, ‘decade’, ‘scenario’, ‘robotics’, ‘evolution’, ‘interaction’, ‘air’, ‘tracking’, ‘space’, ‘scientific’, ‘entrepreneurial’, ‘intelligent’, ‘keyword’, ‘strategy’, ‘recent’, ‘european’, ‘transformation’, ‘complex’, ‘coword’, ‘model’, ‘indicator’, ‘40’, ‘activity’, ‘prediction’, ‘intellectual’, ‘quality’, ‘landscape’, ‘domain’, ‘impact’, ‘generative’, ‘output’, ‘automatic’, ‘vosviewer’, ‘innovation’, ‘19822019’, ‘iapplied’, ‘aerial’, ‘socialmedia’, ‘kg4ai’, ‘30th’, ‘19602021’, ‘strategicmanagement’, ‘iso’, ‘iscimati’, ‘20072016’, ‘auv’, ‘tendency’, ‘aerospace’, ‘unmanned’, ‘anniversary’, ‘hcii’, ‘aiinfused’, ‘navigation’, ‘adapts’, ‘scopusbased’, ‘90012015’, ‘competitiveness’, ‘involving’, ‘right’, ‘burberry’, ‘dispersion’]	[‘scientometric’, ‘bibliometric’, ‘bibliometrics’, ‘scopusbased’, ‘artificialintelligence’, ‘ai’, ‘humancomputer’, ‘intelligence’, ‘aiinfused’, ‘knowledge’]	[‘trend’, ‘bibliometrics’, ‘kalman’, ‘artificialintelligence’, ‘scientometric’, ‘complex’, ‘19822019’, ‘auv’, ‘unmanned’, ‘aiinfused’]
	Bibliometric and ML/AI-Enhanced Analysis of Sustainable Development and Technological Innovations	16	27	Artificial Intelligence for Business Management and Innovation	16_business_management_cocitation_digital	[‘business’, ‘management’, ‘cocitation’, ‘digital’, ‘modeling’, ‘creativity’, ‘topic’, ‘intellectual’, ‘structure’, ‘transformation’, ‘innovation’, ‘analysis’, ‘knowledge’, ‘industry’, ‘marketing’, ‘acceptance’, ‘system’, ‘entrepreneurship’, ‘field’, ‘bibliometric’, ‘model’, ‘manufacturing’, ‘40’, ‘logistics’, ‘informationtechnology’, ‘evolution’, ‘dynamic’, ‘big’, ‘hot’, ‘cooccurrence’, ‘trend’, ‘technology’, ‘citation’, ‘economics’, ‘scientometric’, ‘waste’, ‘science’, ‘strategic’, ‘iiot’, ‘utaut’, ‘territorial’, ‘fieldconfiguring’, ‘selfservice’, ‘chain’, ‘supply’, ‘bibliometrics’, ‘healthcare’, ‘analytics’, ‘data’, ‘pharmacy’, ‘landscape’, ‘modelling’, ‘electric’, ‘intelligence’, ‘approach’, ‘rural’, ‘customer’, ‘trust’, ‘digitalization’, ‘forecasting’, ‘informationsystems’, ‘software’, ‘service’, ‘artificial’, ‘social’, ‘selfcitation’, ‘advanced’, ‘corporate’, ‘datadriven’, ‘intention’, ‘governance’, ‘mobility’, ‘culture’, ‘impact’, ‘keyword’, ‘strategy’, ‘optimization’, ‘event’, ‘future’, ‘adoption’, ‘industrial’, ‘coword’, ‘understanding’, ‘medium’, ‘mapping’, ‘component’, ‘engineering’, ‘environmental’, ‘construction’, ‘neuralnetworks’, ‘decisionmaking’, ‘finance’, ‘development’, ‘design’, ‘research’, ‘kads’, ‘introduction’, ‘germany’, ‘hospitality’, ‘gephi’]	[‘bibliometric’, ‘bibliometrics’, ‘scientometric’, ‘digitalization’, ‘citation’, ‘informationtechnology’, ‘knowledge’, ‘cocitation’, ‘analytics’, ‘dynamic’]	[‘cocitation’, ‘entrepreneurship’, ‘40’, ‘dynamic’, ‘trend’, ‘scientometric’, ‘iiot’, ‘bibliometrics’, ‘analytics’, ‘digitalization’]
	Bibliometric and Scientometric Analysis of Machine Learning and Educational Technology	26	16	Machine Learning for Education Research Impact Analysis	26_analytics_modeling_topic_education	[‘analytics’, ‘modeling’, ‘topic’, ‘education’, ‘educational’, ‘preservation’, ‘analysis’, ‘contributor’, ‘bibliometric’, ‘decade’, ‘system’, ‘machine’, ‘technology’, ‘syllabus’, ‘learning’, ‘structural’, ‘rpackage’, ‘mainstream’, ‘bibliometrics’, ‘bibliometrix’, ‘multimodal’, ‘network’, ‘conference’, ‘digital’, ‘information’, ‘insight’, ‘social’, ‘blockchain’, ‘predictive’, ‘trend’, ‘based’, ‘technologyi’, ‘ibritish’, ‘tinyml’, ‘tiny’, ‘fontana’, ‘salient’, ‘cs1an’, ‘revalidation’, ‘presented’, ‘deciphering’, ‘disadvantage’, ‘edm’, ‘edtech’, ‘learningexperiences’, ‘universitystudents’, ‘academicachievement’, ‘tinymledu’, ‘smart’, ‘year’, ‘predicting’, ‘mi’, ‘analyzer’, ‘20082019’, ‘embedded’, ‘management’, ‘data’, ‘package’, ‘twentyfive’, ‘aspect’, ‘higher’, ‘research’, ‘point’, ‘initiative’, ‘ict’, ‘repository’, ‘open’, ‘collaboration’, ‘networking’, ‘cooccurrence’, ‘learningbased’, ‘negative’, ‘highereducation’, ‘openaccess’, ‘joint’, ‘privacy’, ‘theme’, ‘impact’, ‘keyword’, ‘material’, ‘mining’, ‘problem’, ‘success’, ‘coword’, ‘knowledge’, ‘task’, ‘analyzing’, ‘journal’, ‘institutional’, ‘school’, ‘applied’, ‘study’, ‘acceptance’, ‘intellectual’, ‘satisfaction’, ‘using’, ‘covid19’, ‘work’, ‘metadata’, ‘teacher’]	[‘bibliometrics’, ‘bibliometric’, ‘bibliometrix’, ‘edtech’, ‘presented’, ‘institutional’, ‘learningexperiences’, ‘education’, ‘learningbased’, ‘knowledge’]	[‘analytics’, ‘education’, ‘contributor’, ‘bibliometrics’, ‘bibliometrix’, ‘edtech’, ‘learningexperiences’, ‘tinymledu’, ‘learningbased’, ‘institutional’]
	Bibliometric and Scientometric Analysis of Artificial Intelligence Research Trends and Applications	25	17	Machine Learning for Bibliometric Analysis	25_bibliometric_face_visualization_failure	[‘bibliometric’, ‘face’, ‘visualization’, ‘failure’, ‘fuzzy’, ‘trend’, ‘machine’, ‘distribution’, ‘analysis’, ‘mapping’, ‘intelligence’, ‘visual’, ‘damage’, ‘artificial’, ‘recognition’, ‘neuralnetwork’, ‘birnbaumsaunders’, ‘fatigue’, ‘streamflow’, ‘textile’, ‘epilepsy’, ‘cumulative’, ‘softwaredefined’, ‘deepfakes’, ‘cartography’, ‘anesthesia’, ‘citespace’, ‘prediction’, ‘land’, ‘log’, ‘deep’, ‘learning’, ‘networking’, ‘vector’, ‘defect’, ‘life’, ‘pattern’, ‘material’, ‘detection’, ‘thematic’, ‘knowledge’, ‘support’, ‘classification’, ‘manufacturing’, ‘neural’, ‘statistical’, ‘network’, ‘algorithm’, ‘neuralnetworks’, ‘representation’, ‘evolution’, ‘landscape’, ‘vision’, ‘map’, ‘vosviewer’, ‘bibliometrics’, ‘model’, ‘research’, ‘coordinate’, ‘period’, ‘organiccarbon’, ‘iexpert’, ‘2004’, ‘inspection’, ‘waterlevel’, ‘2dimensional’, ‘vosviewerbased’, ‘19902019’, ‘landslide’, ‘soil’, ‘bibliometrical’, ‘subfields’, ‘microassembly’, ‘dematel’, ‘roughness’, ‘riskevaluation’, ‘proposition’, ‘fnn’, ‘radical’, ‘preserving’, ‘corrosion’, ‘suitability’, ‘mle’, ‘diagnostics’, ‘avoid’, ‘truncated’, ‘fabric’, ‘fake’, ‘aisdn’, ‘susceptibility’, ‘applicationsi’, ‘year’, ‘science’, ‘technique’, ‘rstudio’, ‘logisticregression’, ‘condition’, ‘prominent’, ‘seizure’, ‘cosine’]	[‘bibliometric’, ‘bibliometrics’, ‘bibliometrical’, ‘classification’, ‘citespace’, ‘knowledge’, ‘neuralnetwork’, ‘neuralnetworks’, ‘research’, ‘statistical’]	[‘machine’, ‘textile’, ‘deepfakes’, ‘citespace’, ‘classification’, ‘neuralnetworks’, ‘bibliometrics’, ‘model’, ‘iexpert’, ‘aisdn’]
	Bibliometric Analysis and Research Evaluation of Metaheuristic Algorithms	3	51	Optimization and Machine Learning in Research Impact Analysis	3_optimization_algorithm_swarm_particle	[‘optimization’, ‘algorithm’, ‘swarm’, ‘particle’, ‘genetic’, ‘evolutionary’, ‘quantum’, ‘error’, ‘evaluation’, ‘pso’, ‘metaheuristics’, ‘colony’, ‘research’, ‘improved’, ‘ant’, ‘oxidation’, ‘design’, ‘problem’, ‘reliability’, ‘independent’, ‘combinatorial’, ‘function’, ‘bibliometric’, ‘cognitive’, ‘based’, ‘law’, ‘station’, ‘flatness’, ‘psosvm’, ‘subway’, ‘dance’, ‘inspired’, ‘seismic’, ‘camouflage’, ‘quantuminspired’, ‘computing’, ‘network’, ‘metaheuristic’, ‘method’, ‘effect’, ‘radio’, ‘annealing’, ‘art’, ‘performance’, ‘simulated’, ‘order’, ‘search’, ‘analysis’, ‘scheduling’, ‘signal’, ‘journal’, ‘flexible’, ‘component’, ‘multiobjective’, ‘model’, ‘fusion’, ‘catalyst’, ‘dysfunction’, ‘camoevo’, ‘pursuit’, ‘vanet’, ‘scanning’, ‘mdo’, ‘rod’, ‘deteriorating’, ‘roundness’, ‘fishery’, ‘bioeconomic’, ‘multipopulation’, ‘pipe’, ‘shop’, ‘array’, ‘wideangle’, ‘rectal’, ‘vehicular’, ‘connecting’, ‘waterlogging’, ‘compression’, ‘antenna’, ‘kinetics’, ‘holographic’, ‘jobshop’, ‘chaos’, ‘phased’, ‘neural’, ‘application’, ‘selection’, ‘strategy’, ‘resilience’, ‘objective’, ‘thyroid’, ‘routing’, ‘riskassessment’, ‘managerial’, ‘sampling’, ‘nature’, ‘disaster’, ‘projection’, ‘disruption’, ‘completeness’]	[‘metaheuristics’, ‘metaheuristic’, ‘pso’, ‘algorithm’, ‘psosvm’, ‘bibliometric’, ‘optimization’, ‘method’, ‘search’, ‘multiobjective’]	[‘evolutionary’, ‘pso’, ‘metaheuristics’, ‘ant’, ‘quantuminspired’, ‘computing’, ‘metaheuristic’, ‘simulated’, ‘search’, ‘vanet’]
	Causal Inference in Research Evaluation and Bibliometric Impact Analysis	4	51	Causal Inference and Machine Learning for Research Impact Analysis	4_causal_inference_bayesian_risk	[‘causal’, ‘inference’, ‘bayesian’, ‘risk’, ‘effect’, ‘propensity’, ‘research’, ‘outcome’, ‘network’, ‘qualitative’, ‘estimation’, ‘policy’, ‘health’, ‘assessment’, ‘bias’, ‘method’, ‘observational’, ‘score’, ‘trial’, ‘comparative’, ‘randomization’, ‘acyclic’, ‘psychotherapy’, ‘directed’, ‘mendelian’, ‘evaluation’, ‘equipment’, ‘maintenance’, ‘epidemiology’, ‘open’, ‘journal’, ‘clinical’, ‘citation’, ‘access’, ‘based’, ‘green’, ‘medication’, ‘retracted’, ‘retraction’, ‘covariate’, ‘balance’, ‘counselling’, ‘science’, ‘effectiveness’, ‘mediation’, ‘enrollment’, ‘service’, ‘regression’, ‘impact’, ‘analysis’, ‘statistical’, ‘study’, ‘matching’, ‘injury’, ‘disease’, ‘care’, ‘probabilistic’, ‘situation’, ‘design’, ‘forecasting’, ‘fault’, ‘association’, ‘collaboration’, ‘environment’, ‘potential’, ‘interaction’, ‘randomized’, ‘significance’, ‘process’, ‘time’, ‘model’, ‘communication’, ‘simulation’, ‘feedback’, ‘hazardous’, ‘highrisk’, ‘caseonly’, ‘train’, ‘cardiovascular’, ‘endogeneity’, ‘counterfactuals’, ‘cornerstone’, ‘fall’, ‘contractor’, ‘routine’, ‘causation’, ‘moderation’, ‘gut’, ‘battlefield’, ‘ucav’, ‘microbiota’, ‘subclassification’, ‘pedestrian’, ‘graph’, ‘question’, ‘dynamic’, ‘testing’, ‘empirical’, ‘information’, ‘adjustment’]	[‘impact’, ‘empirical’, ‘citation’, ‘causal’, ‘endogeneity’, ‘covariate’, ‘effectiveness’, ‘research’, ‘study’, ‘effect’]	[‘causal’, ‘bayesian’, ‘propensity’, ‘method’, ‘acyclic’, ‘mendelian’, ‘covariate’, ‘effectiveness’, ‘endogeneity’, ‘counterfactuals’]
	Bibliometric and Machine Learning Approaches to Analysing Research Trends and Impact in Artificial Intelligence	20	20	Machine Learning for Bibliometric Analysis	20_neural_network_categorization_inventory	[‘neural’, ‘network’, ‘categorization’, ‘inventory’, ‘biomedical’, ‘artificial’, ‘bibliometric’, ‘technique’, ‘machine’, ‘bullwhip’, ‘chemometrics’, ‘metamethodology’, ‘data’, ‘gas’, ‘nanoparticles’, ‘associated’, ‘control’, ‘mining’, ‘oil’, ‘bibliometrics’, ‘india’, ‘keywords’, ‘logic’, ‘method’, ‘document’, ‘diagram’, ‘cell’, ‘learning’, ‘using’, ‘number’, ‘excellence’, ‘synthesis’, ‘system’, ‘law’, ‘financial’, ‘classification’, ‘supply’, ‘management’, ‘analysis’, ‘overview’, ‘fuzzy’, ‘article’, ‘scientometrics’, ‘application’, ‘collagen’, ‘neurofuzzy’, ‘chemometric’, ‘paleontology’, ‘networksbased’, ‘19912014’, ‘toxicity’, ‘19801991’, ‘reemergence’, ‘modelpredictive’, ‘bone’, ‘requires’, ‘businessrelated’, ‘nanomaterial’, ‘spotlight’, ‘gat’, ‘mark’, ‘erythrocytemembrane’, ‘encapsulation’, ‘europeanunion’, ‘membranecoated’, ‘coordination’, ‘colombia’, ‘taphonomy’, ‘historiography’, ‘hopfield’, ‘standardised’, ‘feedforward’, ‘explore’, ‘lab’, ‘graphicacy’, ‘kohonen’, ‘industry’, ‘scholarly’, ‘research’, ‘colombian’, ‘membrane’, ‘decisionsupportsystem’, ‘machinelearningbased’, ‘mini’, ‘biodiversity’, ‘biological’, ‘examine’, ‘lotka’, ‘retrieval’, ‘decision’, ‘intelligent’, ‘citation’, ‘information’, ‘cut’, ‘demand’, ‘lead’, ‘population’, ‘routing’, ‘instrumental’, ‘great’]	[‘bibliometrics’, ‘bibliometric’, ‘scientometrics’, ‘classification’, ‘chemometrics’, ‘categorization’, ‘citation’, ‘chemometric’, ‘machinelearningbased’, ‘networksbased’]	[‘neural’, ‘chemometrics’, ‘metamethodology’, ‘bibliometrics’, ‘classification’, ‘scientometrics’, ‘neurofuzzy’, ‘decisionsupportsystem’, ‘machinelearningbased’, ‘citation’]
	Bibliometric and Information Retrieval Impact Analysis	11	37	Machine Learning for Information Retrieval and Citation Analysis in Medical Research	11_database_search_recall_medline	[‘database’, ‘search’, ‘recall’, ‘medline’, ‘bibliographic’, ‘precision’, ‘retrieval’, ‘federated’, ‘chemical’, ‘heading’, ‘subject’, ‘systematic’, ‘medical’, ‘strategy’, ‘literature’, ‘pubmed’, ‘searching’, ‘controlled’, ‘review’, ‘randomized’, ‘embase’, ‘information’, ‘care’, ‘vocabulary’, ‘mesh’, ‘trial’, ‘health’, ‘study’, ‘rate’, ‘data’, ‘learning’, ‘identifying’, ‘controlledtrials’, ‘substance’, ‘thesaurus’, ‘hemorrhage’, ‘survival’, ‘multiinstitutional’, ‘machine’, ‘best’, ‘medicine’, ‘deep’, ‘semisupervised’, ‘evidencebased’, ‘crowdsourcing’, ‘clinical’, ‘area’, ‘accuracy’, ‘record’, ‘model’, ‘test’, ‘inspec’, ‘toxicological’, ‘uncontrolled’, ‘psychosocial’, ‘curation’, ‘congenital’, ‘occupationalhealth’, ‘prostate’, ‘omop’, ‘intracerebral’, ‘communityengaged’, ‘expansion’, ‘prognosis’, ‘hypothyroidism’, ‘lilac’, ‘musculoskeletal’, ‘decision’, ‘abstract’, ‘biomedical’, ‘gpt4’, ‘augmentation’, ‘risk’, ‘evidence’, ‘representation’, ‘support’, ‘selected’, ‘transformerbased’, ‘query’, ‘guideline’, ‘relevant’, ‘performance’, ‘confidence’, ‘open’, ‘collaboration’, ‘evaluation’, ‘outcome’, ‘informatics’, ‘patient’, ‘improving’, ‘stroke’, ‘validation’, ‘synthesis’, ‘access’, ‘cancer’, ‘common’, ‘transfer’, ‘engine’, ‘statistic’, ‘critical’]	[‘pubmed’, ‘medline’, ‘search’, ‘database’, ‘retrieval’, ‘searching’, ‘embase’, ‘systematic’, ‘bibliographic’, ‘review’]	[‘search’, ‘medline’, ‘pubmed’, ‘embase’, ‘mesh’, ‘identifying’, ‘thesaurus’, ‘semisupervised’, ‘crowdsourcing’, ‘communityengaged’]
	Bibliometric Analysis of Deep Learning and Machine Learning Technologies in Various Domains	29	13	Machine Learning for Bibliometric Analysis and Research Impact Prediction	29_deep_disease_object_bibliometric	[‘deep’, ‘disease’, ‘object’, ‘bibliometric’, ‘screening’, ‘fraud’, ‘kidney’, ‘credit’, ‘learning’, ‘resistance’, ‘card’, ‘convolutional’, ‘vision’, ‘detection’, ‘classification’, ‘rating’, ‘microsimulation’, ‘chronic’, ‘antimicrobial’, ‘leaf’, ‘computer’, ‘machine’, ‘literature’, ‘3d’, ‘analysis’, ‘agency’, ‘fulltext’, ‘bibliometricenhanced’, ‘ensemble’, ‘systematic’, ‘lstm’, ‘cancer’, ‘orientation’, ‘neural’, ‘algorithm’, ‘application’, ‘automatic’, ‘vosviewer’, ‘review’, ‘economics’, ‘transaction’, ‘nephropathy’, ‘renaldisease’, ‘mitosis’, ‘iga’, ‘faultdiagnosis’, ‘functionbased’, ‘glomerularfiltrationrate’, ‘gradebased’, ‘skincancer’, ‘exclusively’, ‘progression’, ‘costeffectiveness’, ‘20142024’, ‘ckd’, ‘visualizationbased’, ‘antibiotic’, ‘20072019’, ‘occurs’, ‘occf’, ‘recognition’, ‘text’, ‘decline’, ‘bond’, ‘2015’, ‘accountable’, ‘qualityoflife’, ‘lungcancer’, ‘retrieval’, ‘intelligent’, ‘analytics’, ‘stage’, ‘population’, ‘risk’, ‘health’, ‘future’, ‘online’, ‘model’, ‘network’, ‘bidirectional’, ‘leveraging’, ‘bibliometrics’, ‘biblioshiny’, ‘2023’, ‘present’, ‘imaging’, ‘hot’, ‘era’, ‘index’, ‘governance’, ‘searching’, ‘cost’, ‘past’, ‘data’, ‘architecture’, ‘semantics’, ‘neuralnetwork’, ‘recent’, ‘attention’, ‘novel’]	[‘classification’, ‘bibliometrics’, ‘review’, ‘bibliometric’, ‘convolutional’, ‘bibliometricenhanced’, ‘systematic’, ‘fulltext’, ‘neuralnetwork’, ‘neural’]	[‘convolutional’, ‘classification’, ‘bibliometricenhanced’, ‘review’, ‘skincancer’, ‘ckd’, ‘analytics’, ‘leveraging’, ‘bibliometrics’, ‘neuralnetwork’]
	Bibliometric and Scientometric Trends in ML/AI across Diverse Domains	22	18	Artificial Intelligence for Autism Spectrum Disorder Research	22_robot_safety_disorder_military	[‘robot’, ‘safety’, ‘disorder’, ‘military’, ‘trend’, ‘stroke’, ‘autism’, ‘mastitis’, ‘intelligence’, ‘bibliometric’, ‘artificial’, ‘spectrum’, ‘rehabilitation’, ‘machine’, ‘gait’, ‘glove’, ‘global’, ‘hotspot’, ‘interaction’, ‘analysis’, ‘ai’, ‘poststroke’, ‘cow’, ‘defence’, ‘milk’, ‘learning’, ‘cuttingedge’, ‘security’, ‘sensor’, ‘depression’, ‘research’, ‘visualized’, ‘robotics’, ‘road’, ‘movement’, ‘child’, ‘operation’, ‘technology’, ‘transportation’, ‘metaanalysis’, ‘test’, ‘precision’, ‘monitoring’, ‘citespace’, ‘national’, ‘state’, ‘disease’, ‘data’, ‘survey’, ‘application’, ‘bibliometrics’, ‘ligament’, ‘pathological’, ‘svc’, ‘migraine’, ‘parkinsonsdisease’, ‘radiologyrelated’, ‘grasp’, ‘psychiatric’, ‘subclinical’, ‘1992’, ‘riskfactors’, ‘robotic’, ‘thumb’, ‘farm’, ‘exoskeleton’, ‘wearable’, ‘runtime’, ‘dairycattle’, ‘culturaldifferences’, ‘biomarkers’, ‘cruciate’, ‘accelerometer’, ‘arat’, ‘dairy’, ‘cortical’, ‘walking’, ‘depressive’, ‘cerebralpalsy’, ‘depolarization’, ‘bovine’, ‘block’, ‘bipolar’, ‘natural’, ‘processing’, ‘service’, ‘technique’, ‘identification’, ‘medical’, ‘roc’, ‘anterior’, ‘arm’, ‘disability’, ‘spreading’, ‘condition’, ‘theoretical’, ‘humancentered’, ‘intelligent’, ‘connectivity’, ‘virtualreality’]	[‘bibliometric’, ‘bibliometrics’, ‘survey’, ‘metaanalysis’, ‘robotic’, ‘ai’, ‘research’, ‘robot’, ‘citespace’, ‘robotics’]	[‘safety’, ‘gait’, ‘ai’, ‘poststroke’, ‘research’, ‘parkinsonsdisease’, ‘robotic’, ‘exoskeleton’, ‘arm’, ‘disability’]
Machine Learning and Artificial Intelligence for Bibliometric Analysis	ML/AI-Driven Bibliometrics and Research Impact Assessment15	14	31	Machine Learning for Research Impact Analysis	14_machine_impact_peer_paper	[‘machine’, ‘impact’, ‘peer’, ‘paper’, ‘highly’, ‘citation’, ‘indicator’, ‘cited’, ‘count’, ‘learning’, ‘exercise’, ‘review’, ‘quality’, ‘article’, ‘journal’, ‘readability’, ‘feature’, ‘publication’, ‘research’, ‘bibliometrics’, ‘funding’, ‘textual’, ‘scientometrics’, ‘national’, ‘productivity’, ‘academic’, ‘finance’, ‘selection’, ‘ranking’, ‘importance’, ‘excellence’, ‘bias’, ‘cost’, ‘science’, ‘institute’, ‘predicting’, ‘behavior’, ‘predict’, ‘basin’, ‘pacific’, ‘blood’, ‘algorithmic’, ‘lung’, ‘excellent’, ‘decision’, ‘data’, ‘transparency’, ‘reader’, ‘disclosure’, ‘assessment’, ‘uk’, ‘highlycited’, ‘machinelearning’, ‘imbalanced’, ‘sleeping’, ‘beauty’, ‘heart’, ‘informetrics’, ‘bibliometric’, ‘accounting’, ‘alternative’, ‘assisted’, ‘perspective’, ‘institution’, ‘svm’, ‘metric’, ‘criterion’, ‘radiology’, ‘scientific’, ‘number’, ‘performance’, ‘evaluation’, ‘rating’, ‘intelligence’, ‘improve’, ‘book’, ‘characteristic’, ‘field’, ‘determinant’, ‘abstract’, ‘prediction’, ‘insight’, ‘forest’, ‘analysis’, ‘methodology’, ‘disambiguation’, ‘index’, ‘random’, ‘topic’, ‘handelsblatt’, ‘oa’, ‘oxygenuptake’, ‘habilitation’, ‘glmlss’, ‘fitness’, ‘o2’, ‘realestate’, ‘gini’, ‘highintensity’, ‘matthew’]	[‘scientometrics’, ‘bibliometric’, ‘bibliometrics’, ‘informetrics’, ‘citation’, ‘cited’, ‘ranking’, ‘highlycited’, ‘publication’, ‘research’]	[‘citation’, ‘review’, ‘readability’, ‘bibliometrics’, ‘scientometrics’, ‘ranking’, ‘predict’, ‘machinelearning’, ‘informetrics’, ‘bibliometric’]
	ML/AI-Enhanced Bibliometric Analysis and Publication Impact Evaluation	15	31	Artificial Intelligence for Scientific Collaboration and Citation Impact Analysis	15_artificial_intelligence_collaboration_journal	[‘artificial’, ‘intelligence’, ‘collaboration’, ‘journal’, ‘citation’, ‘impact’, ‘international’, ‘ranking’, ‘hindex’, ‘pattern’, ‘business’, ‘ai’, ‘paper’, ‘pagerank’, ‘publication’, ‘isi’, ‘science’, ‘index’, ‘proceeding’, ‘helix’, ‘institution’, ‘global’, ‘institutional’, ‘case’, ‘computerscience’, ‘quality’, ‘research’, ‘google’, ‘scientific’, ‘productivity’, ‘level’, ‘perplexity’, ‘agglomeration’, ‘pasteur’, ‘quadrant’, ‘hcindex’, ‘coordinated’, ‘gergm’, ‘asia’, ‘network’, ‘national’, ‘performance’, ‘computer’, ‘innovation’, ‘counting’, ‘gindex’, ‘triple’, ‘trend’, ‘analysis’, ‘novelty’, ‘russian’, ‘researcher’, ‘china’, ‘collaborative’, ‘reference’, ‘combination’, ‘comparing’, ‘academic’, ‘scholarly’, ‘conditional’, ‘leadership’, ‘development’, ‘scientist’, ‘world’, ‘competition’, ‘based’, ‘approach’, ‘factor’, ‘economy’, ‘conference’, ‘field’, ‘scholar’, ‘evaluating’, ‘study’, ‘environmental’, ‘measure’, ‘domain’, ‘article’, ‘perception’, ‘data’, ‘scopus’, ‘survey’, ‘coauthorship’, ‘design’, ‘thematical’, ‘uzzi’, ‘managementi’, ‘fellowship’, ‘mostcited’, ‘atypical’, ‘authoraffiliationindex’, ‘parsimonious’, ‘morphology’, ‘form’, ‘1985’, ‘patentcited’, ‘topicsensitivity’, ‘independency’, ‘citerbased’, ‘ntuple’]	[‘ranking’, ‘citation’, ‘institution’, ‘pagerank’, ‘scopus’, ‘coauthorship’, ‘scholarly’, ‘isi’, ‘institutional’, ‘authoraffiliationindex’]	[‘citation’, ‘ranking’, ‘ai’, ‘pagerank’, ‘isi’, ‘index’, ‘conference’, ‘scopus’, ‘coauthorship’, ‘authoraffiliationindex’]
Machine Learning and Artificial Intelligence for Specific Dimension of Research Impact	ML/AI-driven Evaluation Metrics and Models for Research Performance and Innovation Assessment	5	51	Machine Learning for Research Impact Analysis	5_project_evaluation_based_risk	[‘project’, ‘evaluation’, ‘based’, ‘risk’, ‘neural’, ‘performance’, ‘principal’, ‘research’, ‘component’, ‘method’, ‘network’, ‘safety’, ‘enterprise’, ‘selection’, ‘svm’, ‘management’, ‘model’, ‘deep’, ‘grey’, ‘assessment’, ‘comprehensive’, ‘manifold’, ‘sparse’, ‘hierarchy’, ‘linear’, ‘prediction’, ‘learning’, ‘universityindustry’, ‘algorithm’, ‘theory’, ‘error’, ‘analytic’, ‘matrix’, ‘wavelet’, ‘entropy’, ‘technology’, ‘human’, ‘regression’, ‘analysis’, ‘traffic’, ‘propagation’, ‘data’, ‘process’, ‘weighted’, ‘metric’, ‘investment’, ‘outward’, ‘underground’, ‘microwave’, ‘baijiu’, ‘truck’, ‘overrun’, ‘centrifugal’, ‘rtd’, ‘compressor’, ‘multiplier’, ‘stickslip’, ‘pose’, ‘straightness’, ‘consistency’, ‘openpit’, ‘tacit’, ‘phm’, ‘armored’, ‘uniformity’, ‘evidential’, ‘flavor’, ‘committime’, ‘sensory’, ‘passthrough’, ‘evacuation’, ‘intelligent’, ‘statistical’, ‘empirical’, ‘artificial’, ‘machine’, ‘innovation’, ‘exchange’, ‘loading’, ‘accident’, ‘voltage’, ‘fund’, ‘spss’, ‘black’, ‘equal’, ‘forecast’, ‘reasoning’, ‘construction’, ‘rate’, ‘hydrostatic’, ‘recurrent’, ‘supplier’, ‘heating’, ‘food’, ‘driving’, ‘vibration’, ‘crossvalidation’, ‘direct’, ‘rough’, ‘output’]	[‘weighted’, ‘evaluation’, ‘project’, ‘crossvalidation’, ‘svm’, ‘empirical’, ‘research’, ‘spss’, ‘assessment’, ‘method’]	[‘project’, ‘risk’, ‘neural’, ‘svm’, ‘algorithm’, ‘wavelet’, ‘weighted’, ‘truck’, ‘spss’, ‘crossvalidation’]
	Sentiment and Citation Impact Analysis through Machine Learning Techniques	6	48	Citation Analysis and Sentiment Measurement in Scientific Research	6_sentiment_citation_analysis_classification	[‘sentiment’, ‘citation’, ‘analysis’, ‘classification’, ‘opinion’, ‘scientific’, ‘multitask’, ‘impact’, ‘feature’, ‘intext’, ‘paper’, ‘negative’, ‘mining’, ‘emotion’, ‘ontology’, ‘index’, ‘advertising’, ‘important’, ‘science’, ‘linguistic’, ‘context’, ‘author’, ‘article’, ‘ngram’, ‘good’, ‘factor’, ‘text’, ‘machine’, ‘intent’, ‘reason’, ‘smote’, ‘language’, ‘frequency’, ‘identification’, ‘polarity’, ‘processing’, ‘learning’, ‘approach’, ‘evaluation’, ‘using’, ‘datasets’, ‘knn’, ‘ranking’, ‘natural’, ‘publication’, ‘technique’, ‘vector’, ‘textual’, ‘binary’, ‘quality’, ‘recommendation’, ‘term’, ‘ccro’, ‘nlpbased’, ‘f1000prime’, ‘ngrams’, ‘postpublication’, ‘disagreement’, ‘gst’, ‘positivity’, ‘meaningful’, ‘variation’, ‘serbian’, ‘10k’, ‘sentiwordnet’, ‘retrieval’, ‘extraction’, ‘search’, ‘contextbased’, ‘tackle’, ‘authoritative’, ‘aspect’, ‘tax’, ‘reproducible’, ‘usefulness’, ‘cited’, ‘model’, ‘support’, ‘mapping’, ‘forest’, ‘span’, ‘determining’, ‘summarisation’, ‘wos’, ‘comment’, ‘tfidf’, ‘research’, ‘random’, ‘consumer’, ‘preliminary’, ‘semantic’, ‘semisupervised’, ‘multilayer’, ‘study’, ‘product’, ‘linguistics’, ‘disciplinary’, ‘communication’, ‘profiling’, ‘weight’]	[‘citation’, ‘nlpbased’, ‘cited’, ‘ranking’, ‘classification’, ‘impact’, ‘publication’, ‘evaluation’, ‘postpublication’, ‘polarity’]	[‘citation’, ‘classification’, ‘impact’, ‘polarity’, ‘using’, ‘ranking’, ‘nlpbased’, ‘ngrams’, ‘postpublication’, ‘sentiwordnet’]
	Predictive Modelling of Citation Impact and Research Metrics	12	34	Citation Prediction and Impact Analysis in Academic Research	12_count_citation_prediction_impact	[‘count’, ‘citation’, ‘prediction’, ‘impact’, ‘deep’, ‘network’, ‘feature’, ‘paper’, ‘journal’, ‘learning’, ‘neural’, ‘predicting’, ‘bert’, ‘using’, ‘scientific’, ‘graph’, ‘acknowledgement’, ‘factor’, ‘semantic’, ‘index’, ‘hindex’, ‘machine’, ‘academic’, ‘series’, ‘article’, ‘text’, ‘classification’, ‘ranking’, ‘analysis’, ‘temporal’, ‘publication’, ‘rule’, ‘time’, ‘dataset’, ‘boundaryspanning’, ‘burst’, ‘wlsa’, ‘anomalous’, ‘junior’, ‘missing’, ‘guidance’, ‘citescore’, ‘imputation’, ‘advance’, ‘span’, ‘summarisation’, ‘award’, ‘score’, ‘forecasting’, ‘shortterm’, ‘document’, ‘scholarly’, ‘popularity’, ‘variant’, ‘context’, ‘long’, ‘title’, ‘cnn’, ‘multivariate’, ‘scheme’, ‘contribution’, ‘memory’, ‘selection’, ‘weighted’, ‘detection’, ‘influential’, ‘future’, ‘knowledge’, ‘model’, ‘term’, ‘visual’, ‘national’, ‘function’, ‘abstract’, ‘value’, ‘link’, ‘cited’, ‘representation’, ‘convolutional’, ‘metadata’, ‘scientometrics’, ‘role’, ‘based’, ‘regression’, ‘researcher’, ‘finetuning’, ‘annotator’, ‘knni’, ‘affecting’, ‘lm’, ‘formulation’, ‘fox’, ‘ibro’, ‘an’, ‘topicscore’, ‘inflation’, ‘am’, ‘fundamental’, ‘frequencyinverse’, ‘aps’]	[‘scientometrics’, ‘citation’, ‘cited’, ‘classification’, ‘predicting’, ‘weighted’, ‘ranking’, ‘scholarly’, ‘publication’, ‘convolutional’]	[‘citation’, ‘neural’, ‘predicting’, ‘bert’, ‘classification’, ‘ranking’, ‘citescore’, ‘scientometrics’, ‘annotator’, ‘topicscore’]
	ML/AI in Research Evaluation and Methodology	13	31	Machine Learning for Domestic Violence Policing	13_policing_violence_qualitative_domestic	[‘policing’, ‘violence’, ‘qualitative’, ‘domestic’, ‘organizational’, ‘research’, ‘reporting’, ‘tool’, ‘machine’, ‘humanai’, ‘clinical’, ‘inquiry’, ‘evaluation’, ‘screening’, ‘appreciative’, ‘patrol’, ‘natural’, ‘usability’, ‘order’, ‘language’, ‘criterion’, ‘classification’, ‘measurement’, ‘collaboration’, ‘nlp’, ‘processing’, ‘learning’, ‘abuse’, ‘seamless’, ‘meter’, ‘authoring’, ‘eligibility’, ‘checklist’, ‘posttraumatic’, ‘prescreening’, ‘starml’, ‘foot’, ‘performance’, ‘evaluating’, ‘family’, ‘adversarial’, ‘welfare’, ‘phone’, ‘assessment’, ‘versus’, ‘sensemaking’, ‘portfolio’, ‘automation’, ‘inclusion’, ‘methodology’, ‘leveraging’, ‘scoring’, ‘knowledge’, ‘ai’, ‘text’, ‘justice’, ‘extension’, ‘classifying’, ‘crowdsourcing’, ‘stress’, ‘child’, ‘translational’, ‘social’, ‘reading’, ‘identification’, ‘transformer’, ‘using’, ‘case’, ‘focus’, ‘example’, ‘intelligent’, ‘potential’, ‘disorder’, ‘transfer’, ‘diversity’, ‘community’, ‘change’, ‘program’, ‘translation’, ‘record’, ‘discovery’, ‘task’, ‘authorship’, ‘platform’, ‘action’, ‘abstract’, ‘mobile’, ‘question’, ‘science’, ‘supervised’, ‘high’, ‘overview’, ‘open’, ‘decisionmaking’, ‘quality’, ‘domain’, ‘perception’, ‘work’, ‘data’, ‘human’]	[‘classifying’, ‘classification’, ‘methodology’, ‘qualitative’, ‘research’, ‘evaluation’, ‘assessment’, ‘crowdsourcing’, ‘evaluating’, ‘nlp’]	[‘policing’, ‘qualitative’, ‘tool’, ‘humanai’, ‘nlp’, ‘posttraumatic’, ‘sensemaking’, ‘ai’, ‘classifying’, ‘crowdsourcing’]
	Impact of Deep Learning and Machine Learning on Research Collaboration Prediction and Network Analysis	31	13	Machine Learning for Collaboration and Citation Prediction in Academic Research	31_link_collaboration_coauthorship_network	[‘link’, ‘collaboration’, ‘coauthorship’, ‘network’, ‘prediction’, ‘representation’, ‘elm’, ‘layer’, ‘autoencoder’, ‘hidden’, ‘recommender’, ‘learning’, ‘recommendation’, ‘topn’, ‘double’, ‘delmae’, ‘myelopathy’, ‘original’, ‘spatialtemporal’, ‘hierarchical’, ‘future’, ‘expressive’, ‘denoising’, ‘random’, ‘input’, ‘extreme’, ‘feature’, ‘predicting’, ‘multilayer’, ‘machine’, ‘deep’, ‘system’, ‘pattern’, ‘centrality’, ‘novel’, ‘research’, ‘author’, ‘supervised’, ‘evolution’, ‘training’, ‘collaborative’, ‘attentionbased’, ‘spondylotic’, ‘shall’, ‘architectural’, ‘helm’, ‘proximityaware’, ‘randomwalk’, ‘hdelm’, ‘restricted’, ‘similaritybased’, ‘decompression’, ‘degenerative’, ‘multinetwork’, ‘ratingtrend’, ‘nodal’, ‘closeness’, ‘fast’, ‘extremely’, ‘linkprediction’, ‘compact’, ‘autoencoders’, ‘cervical’, ‘incorporating’, ‘graphbased’, ‘similarity’, ‘approach’, ‘speed’, ‘decoding’, ‘discriminantanalysis’, ‘surgical’, ‘stochastic’, ‘graph’, ‘lead’, ‘description’, ‘attribute’, ‘encoding’, ‘factorization’, ‘scientific’, ‘machinelearning’, ‘interest’, ‘affiliation’, ‘signature’, ‘stacking’, ‘relevant’, ‘embeddings’, ‘neighbor’, ‘maximization’, ‘filtering’, ‘kernel’, ‘ecommerce’, ‘noise’, ‘matrix’, ‘coauthor’, ‘used’, ‘reduction’, ‘different’, ‘cooccurrence’, ‘capacity’, ‘leadership’]	[‘linkprediction’, ‘graphbased’, ‘similaritybased’, ‘coauthorship’, ‘embeddings’, ‘centrality’, ‘autoencoder’, ‘network’, ‘coauthor’, ‘autoencoders’]	[‘link’, ‘coauthorship’, ‘network’, ‘autoencoder’, ‘recommender’, ‘centrality’, ‘similaritybased’, ‘linkprediction’, ‘graphbased’, ‘embeddings’]
	Impact of Deep Learning and Machine Learning on Citation Analysis and Recommender Systems	30	13	Machine Learning for Research Impact Analysis	30_recommender_system_citation_context	[‘recommender’, ‘system’, ‘citation’, ‘context’, ‘recommendation’, ‘scientific’, ‘filtering’, ‘deep’, ‘summarization’, ‘discourse’, ‘paper’, ‘facet’, ‘multitopic’, ‘k12’, ‘learning’, ‘personalized’, ‘patent’, ‘pagerank’, ‘proximity’, ‘flow’, ‘semantics’, ‘analysis’, ‘measuring’, ‘influential’, ‘knowledge’, ‘coupling’, ‘elearning’, ‘representation’, ‘using’, ‘article’, ‘survey’, ‘news’, ‘trend’, ‘collaborative’, ‘ontological’, ‘contextualization’, ‘suggester’, ‘serendipity’, ‘informative’, ‘serendipitous’, ‘authorprofilebased’, ‘userprofile’, ‘bibliographic’, ‘similarity’, ‘natural’, ‘document’, ‘processing’, ‘market’, ‘recommending’, ‘kernelbased’, ‘implicit’, ‘light’, ‘candidate’, ‘managing’, ‘classification’, ‘neural’, ‘challenge’, ‘community’, ‘distributed’, ‘dbscan’, ‘modified’, ‘venue’, ‘specialty’, ‘machinelearning’, ‘recurrent’, ‘language’, ‘short’, ‘selforganizing’, ‘networkbased’, ‘academic’, ‘preference’, ‘datasets’, ‘investigating’, ‘long’, ‘machine’, ‘space’, ‘informationscience’, ‘network’, ‘memory’, ‘statistic’, ‘mining’, ‘novel’, ‘understanding’, ‘feedback’, ‘term’, ‘diffusion’, ‘management’, ‘journal’, ‘abstract’, ‘empirical’, ‘insight’, ‘comprehensive’, ‘map’, ‘hybrid’, ‘role’, ‘bibliometrics’, ‘economics’, ‘use’, ‘user’, ‘semantic’]	[‘recommender’, ‘suggester’, ‘machinelearning’, ‘neural’, ‘survey’, ‘recommendation’, ‘informationscience’, ‘pagerank’, ‘classification’, ‘bibliometrics’]	[‘citation’, ‘pagerank’, ‘contextualization’, ‘suggester’, ‘authorprofilebased’, ‘neural’, ‘dbscan’, ‘networkbased’, ‘informationscience’, ‘bibliometrics’]
	Impact and Innovation in Bibliographic Data Extraction and Analysis	8	44	Machine Learning for Bibliographic Disambiguation and Author Disambiguation	8_bibliographic_disambiguation_library_reference	[‘bibliographic’, ‘disambiguation’, ‘library’, ‘reference’, ‘digital’, ‘entity’, ‘metadata’, ‘extraction’, ‘author’, ‘book’, ‘record’, ‘information’, ‘machine’, ‘document’, ‘workshop’, ‘data’, ‘parsing’, ‘loan’, ‘birndl’, ‘retrieval’, ‘format’, ‘conditional’, ‘name’, ‘joint’, ‘bibliometricenhanced’, ‘language’, ‘random’, ‘parser’, ‘detection’, ‘deep’, ‘linkage’, ‘vector’, ‘learning’, ‘named’, ‘linking’, ‘processing’, ‘image’, ‘resolution’, ‘using’, ‘network’, ‘neural’, ‘support’, ‘natural’, ‘nlp’, ‘history’, ‘recommendation’, ‘server’, ‘equivalence’, ‘linked’, ‘bibframe’, ‘cip’, ‘commentary’, ‘subjectclassification’, ‘html’, ‘rdf’, ‘hyperdocument’, ‘based’, ‘coupling’, ‘hierarchical’, ‘authorship’, ‘layout’, ‘description’, ‘field’, ‘publication’, ‘century’, ‘approximate’, ‘bibliography’, ‘opensource’, ‘dblp’, ‘clustering’, ‘citation’, ‘historical’, ‘similarity’, ‘recognition’, ‘semantic’, ‘object’, ‘authority’, ‘summarization’, ‘benchmarking’, ‘svm’, ‘standard’, ‘type’, ‘validation’, ‘source’, ‘scheme’, ‘identity’, ‘electronic’, ‘improvement’, ‘computer’, ‘categorization’, ‘architecture’, ‘tool’, ‘classification’, ‘web’, ‘chinese’, ‘science’, ‘convolutional’, ‘recommender’, ‘work’, ‘map’]	[‘bibliometricenhanced’, ‘bibframe’, ‘subjectclassification’, ‘rdf’, ‘library’, ‘reference’, ‘classification’, ‘citation’, ‘nlp’, ‘bibliography’]	[‘bibliographic’, ‘disambiguation’, ‘library’, ‘bibliometricenhanced’, ‘linkage’, ‘bibframe’, ‘subjectclassification’, ‘rdf’, ‘hyperdocument’, ‘dblp’]
	Impact of Proximity Dimensions and Topic Modelling on International Research Collaboration	33	11	Digital Native and Immigrant Research Collaboration	33_nurse_immigrant_international_dimension	[‘nurse’, ‘immigrant’, ‘international’, ‘dimension’, ‘proximity’, ‘turnover’, ‘journalism’, ‘job’, ‘mobility’, ‘andalusian’, ‘citizen’, ‘employee’, ‘topic’, ‘modeling’, ‘native’, ‘researcher’, ‘collaboration’, ‘talent’, ‘network’, ‘complex’, ‘integration’, ‘digital’, ‘structural’, ‘satisfaction’, ‘using’, ‘regression’, ‘distance’, ‘bibliometrics’, ‘myth’, ‘newsroom’, ‘occupationalmobility’, ‘r12’, ‘registered’, ‘dissatisfaction’, ‘spillover’, ‘sixtytwo’, ‘hyptrails’, ‘immigration’, ‘informationseeking’, ‘impactable’, ‘intergenerational’, ‘migration’, ‘greatbritain’, ‘geometry’, ‘coinventorships’, ‘antecedent’, ‘ethnic’, ‘localization’, ‘explain’, ‘geography’, ‘19942023’, ‘staffing’, ‘content’, ‘outcome’, ‘linear’, ‘analysis’, ‘bayesian’, ‘mortality’, ‘manual’, ‘migrant’, ‘o31’, ‘korean’, ‘coauthorships’, ‘register’, ‘revisiting’, ‘youth’, ‘dimensionality’, ‘databased’, ‘empiricalevidence’, ‘social’, ‘divide’, ‘earnings’, ‘german’, ‘embeddedness’, ‘modelingbased’, ‘health’, ‘database’, ‘medium’, ‘bibliometric’, ‘university’, ‘single’, ‘net’, ‘initiative’, ‘usergenerated’, ‘aid’, ‘structured’, ‘programme’, ‘multidisciplinary’, ‘pilot’, ‘dimensionsai’, ‘unitedstates’, ‘predictor’, ‘prospect’, ‘concrete’, ‘present’, ‘cooperation’, ‘individual’, ‘reduction’, ‘emergence’, ‘augmented’]	[‘bibliometrics’, ‘bibliometric’, ‘structured’, ‘embeddedness’, ‘antecedent’, ‘impactable’, ‘researcher’, ‘multidisciplinary’, ‘databased’, ‘complex’]	[‘proximity’, ‘network’, ‘bibliometrics’, ‘occupationalmobility’, ‘impactable’, ‘migration’, ‘coauthorships’, ‘dimensionality’, ‘databased’, ‘embeddedness’]
	Advanced Techniques in Bibliometric and Scientometric Analysis for Understanding Research Impact and Collaboration	23	18	Gender and Impact of Scholarly Research	23_gender_quotient_information_newspaper	[‘gender’, ‘quotient’, ‘information’, ‘newspaper’, ‘nonlocal’, ‘faculty’, ‘journal’, ‘internationality’, ‘productivity’, ‘doe’, ‘influence’, ‘processing’, ‘academic’, ‘rd’, ‘sourcenormalized’, ‘ocq’, ‘nliq’, ‘research’, ‘impact’, ‘collaboration’, ‘matter’, ‘natural’, ‘scientific’, ‘multidimensional’, ‘scraping’, ‘language’, ‘genderdifferences’, ‘blog’, ‘american’, ‘style’, ‘snip’, ‘management’, ‘sex’, ‘affect’, ‘data’, ‘outcome’, ‘gap’, ‘cooperation’, ‘international’, ‘visualization’, ‘social’, ‘background’, ‘performance’, ‘modeling’, ‘diversity’, ‘citation’, ‘determinant’, ‘web’, ‘function’, ‘supervised’, ‘writing’, ‘structural’, ‘article’, ‘news’, ‘word’, ‘topic’, ‘collaborator’, ‘ugnn’, ‘russia’, ‘refinitiv’, ‘latin’, ‘bertbased’, ‘soviet’, ‘replicability’, ‘journalist’, ‘stringmatching’, ‘scientobase’, ‘idiindex’, ‘metaphilosophy’, ‘female’, ‘jimi’, ‘fi’, ‘machinereadable’, ‘lected’, ‘knowledgesharing’, ‘wisdom’, ‘eld’, ‘informationsources’, ‘thomson’, ‘easy’, ‘selfpromotion’, ‘researchmethods’, ‘gatekeeping’, ‘microservices’, ‘chilean’, ‘chile’, ‘20082018’, ‘roduction’, ‘granular’, ‘builtin’, ‘philosophical’, ‘press’, ‘rhetorical’, ‘reuters’, ‘cobbdouglas’, ‘segregation’, ‘pyrolysis’, ‘said’, ‘descent’, ‘doctrine’]	[‘gender’, ‘researchmethods’, ‘citation’, ‘genderdifferences’, ‘impact’, ‘informationsources’, ‘academic’, ‘research’, ‘collaborator’, ‘multidimensional’]	[‘gender’, ‘impact’, ‘multidimensional’, ‘scraping’, ‘collaborator’, ‘bertbased’, ‘idiindex’, ‘knowledgesharing’, ‘informationsources’, ‘researchmethods’]
	Machine Learning and Statistical Methods for Research Evaluation and Performance Assessment	34	10	Machine Learning for Research Impact Analysis	34_svm_evaluation_normalization_parallel	[‘svm’, ‘evaluation’, ‘normalization’, ‘parallel’, ‘bp’, ‘vocational’, ‘application’, ‘algorithm’, ‘discipline’, ‘project’, ‘research’, ‘set’, ‘large’, ‘college’, ‘clinical’, ‘based’, ‘hownet’, ‘poverty’, ‘fourier’, ‘fourierkernel’, ‘fcm’, ‘alleviation’, ‘environment’, ‘computing’, ‘vector’, ‘photovoltaic’, ‘grade’, ‘performance’, ‘extraction’, ‘scientific’, ‘post’, ‘supplier’, ‘kernel’, ‘subjective’, ‘feature’, ‘multilayer’, ‘benefit’, ‘machine’, ‘index’, ‘space’, ‘development’, ‘regression’, ‘scheme’, ‘classifier’, ‘data’, ‘improve’, ‘student’, ‘credit’, ‘comparison’, ‘question’, ‘function’, ‘chinese’, ‘applied’, ‘automated’, ‘assessment’, ‘comprehensive’, ‘tree’, ‘similarity’, ‘semantic’, ‘decision’, ‘support’, ‘big’, ‘learning’, ‘method’, ‘study’, ‘model’]	[‘svm’, ‘classifier’]	[‘svm’, ‘evaluation’, ‘normalization’, ‘algorithm’, ‘hownet’, ‘vector’, ‘kernel’, ‘classifier’, ‘applied’]
Machine Learning, Artificial Intelligence and Advance Methodologies for Research Impact Applications	Advanced Evaluation Models and Algorithms in Educational, Environmental, and Market Research	24	17	Research Evaluation and Machine Learning	24_evaluation_kmeans_english_based	[‘evaluation’, ‘kmeans’, ‘english’, ‘based’, ‘teaching’, ‘clustering’, ‘model’, ‘platform’, ‘segmentation’, ‘boosting’, ‘tree’, ‘vision’, ‘power’, ‘mining’, ‘service’, ‘vector’, ‘eggshell’, ‘b2c’, ‘marking’, ‘wuqing’, ‘express’, ‘dark’, ‘crossborder’, ‘servqual’, ‘secondhand’, ‘belt’, ‘morality’, ‘couplet’, ‘decision’, ‘college’, ‘dbscan’, ‘image’, ‘algorithm’, ‘support’, ‘point’, ‘completeness’, ‘teacher’, ‘research’, ‘machine’, ‘tire’, ‘spot’, ‘sequence’, ‘coefficient’, ‘ecommerce’, ‘quality’, ‘diagram’, ‘svm’, ‘market’, ‘defect’, ‘foreign’, ‘ability’, ‘price’, ‘climate’, ‘strategic’, ‘data’, ‘business’, ‘adaptive’, ‘credit’, ‘translation’, ‘method’, ‘sustainability’, ‘regression’, ‘commodity’, ‘gbdt’, ‘duty’, ‘penetration’, ‘overseas’, ‘contour’, ‘boruta’, ‘mechine’, ‘salmonella’, ‘dex’, ‘svr’, ‘politics’, ‘car’, ‘ideological’, ‘densitybased’, ‘nadaboost’, ‘silhouette’, ‘svmbased’, ‘user’, ‘semantic’, ‘student’, ‘oriented’, ‘solution’, ‘databased’, ‘crack’, ‘manual’, ‘ideology’, ‘lighting’, ‘university’, ‘gradient’, ‘cycle’, ‘political’, ‘customer’, ‘digital’, ‘learning’, ‘logistics’, ‘road’, ‘big’]	[‘svm’, ‘kmeans’, ‘algorithm’, ‘clustering’, ‘evaluation’, ‘dbscan’, ‘method’, ‘segmentation’, ‘boosting’, ‘databased’]	[‘evaluation’, ‘kmeans’, ‘clustering’, ‘boosting’, ‘vector’, ‘dbscan’, ‘algorithm’, ‘ecommerce’, ‘svm’, ‘databased’]
	Bibliometric and AI-Enhanced Analysis of Sustainable Development and Technological Innovations	17	27	Sustainability and Machine Learning for SDGs	17_sustainable_goal_sustainability_development	[‘sustainable’, ‘goal’, ‘sustainability’, ‘development’, ‘energy’, ‘analysis’, ‘bibliometric’, ‘waste’, ‘sdgs’, ‘management’, ‘country’, ‘growth’, ‘solid’, ‘latent’, ‘bibliometrics’, ‘approach’, ‘procurement’, ‘socioeconomic’, ‘deionization’, ‘intelligence’, ‘artificial’, ‘wireless’, ‘environmental’, ‘allocation’, ‘dirichlet’, ‘governance’, ‘smart’, ‘text’, ‘removal’, ‘businessmanagement’, ‘ecopsychology’, ‘faradaic’, ‘ecocivilization’, ‘achieving’, ‘sourcing’, ‘hyacinth’, ‘corruption’, ‘cee’, ‘city’, ‘network’, ‘challenge’, ‘water’, ‘data’, ‘safety’, ‘insight’, ‘thing’, ‘sdg’, ‘wastewater’, ‘inclusive’, ‘united’, ‘forest’, ‘renewable’, ‘nation’, ‘mining’, ‘concept’, ‘treatment’, ‘ecological’, ‘evolution’, ‘big’, ‘tourism’, ‘carbon’, ‘view’, ‘perspective’, ‘global’, ‘internet’, ‘status’, ‘corporate’, ‘climate’, ‘trend’, ‘world’, ‘transportation’, ‘education’, ‘transfer’, ‘area’, ‘responsible’, ‘event’, ‘generation’, ‘mapping’, ‘machine’, ‘topic’, ‘clustering’, ‘efficiency’, ‘action’, ‘construction’, ‘state’, ‘vision’, ‘power’, ‘identifying’, ‘vosviewer’, ‘control’, ‘china’, ‘research’, ‘experimentalinductive’, ‘electrode’, ‘electrocoagulation’, ‘holistic’, ‘eichhorniacrassipes’, ‘energystorage’, ‘functionality’, ‘hexacyanoferrate’]	[‘bibliometric’, ‘bibliometrics’, ‘sdgs’, ‘sdg’, ‘sustainability’, ‘topic’, ‘sustainable’, ‘latent’, ‘environmental’, ‘achieving’]	[‘sustainability’, ‘bibliometric’, ‘sdgs’, ‘latent’, ‘socioeconomic’, ‘smart’, ‘ecopsychology’, ‘ecocivilization’, ‘sdg’, ‘wastewater’]
	Bibliometric and Scientometric Analysis of Ethical and Responsible Research in AI28	27	15	Artificial Intelligence for Ethical and Responsible Research	27_ethic_explainable_ai_responsible	[‘ethic’, ‘explainable’, ‘ai’, ‘responsible’, ‘ethical’, ‘explanation’, ‘computer’, ‘vision’, ‘bibliometric’, ‘intelligence’, ‘artificial’, ‘graph’, ‘explainability’, ‘reproductive’, ‘xai’, ‘mapping’, ‘worldwide’, ‘citnetexplorer’, ‘synthetic’, ‘counterfactual’, ‘analysis’, ‘assisted’, ‘implication’, ‘issue’, ‘challenge’, ‘research’, ‘health’, ‘conference’, ‘legal’, ‘healthcare’, ‘survey’, ‘vosviewer’, ‘virtue’, ‘understandability’, ‘recourse’, ‘interpretability’, ‘ingredient’, ‘lem’, ‘posthoc’, ‘humanlike’, ‘hope’, ‘infertility’, ‘gaydar’, ‘elsi’, ‘secret’, ‘seven’, ‘exergaming’, ‘aisupported’, ‘commercial’, ‘tailoring’, ‘comprehensibility’, ‘surrogacy’, ‘strand’, ‘machinelike’, ‘mitochondrial’, ‘bias’, ‘content’, ‘international’, ‘metric’, ‘fairness’, ‘replacement’, ‘dilemma’, ‘sexual’, ‘bigdata’, ‘inequality’, ‘fair’, ‘case’, ‘data’, ‘burden’, ‘recovery’, ‘black’, ‘trustworthiness’, ‘guide’, ‘consideration’, ‘field’, ‘smote’, ‘proceeding’, ‘geographic’, ‘technology’, ‘oftheart’, ‘box’, ‘better’, ‘africa’, ‘molecular’, ‘digital’, ‘definition’, ‘highquality’, ‘interpretable’, ‘robotics’, ‘key’, ‘therapy’, ‘imaging’, ‘translational’, ‘gap’, ‘dissemination’, ‘publishing’, ‘innovation’, ‘participatory’, ‘system’, ‘past’]	[‘ai’, ‘bibliometric’, ‘aisupported’, ‘survey’, ‘highquality’, ‘research’, ‘interpretable’, ‘interpretability’, ‘explanation’, ‘humanlike’]	[‘ai’, ‘ethical’, ‘xai’, ‘citnetexplorer’, ‘assisted’, ‘interpretability’, ‘surrogacy’, ‘bigdata’, ‘oftheart’, ‘participatory’]
	Bibliometric and Scientometric Analysis of Emerging Technologies in Cryptocurrency and Blockchain	28	15	Blockchain and Artificial Intelligence for Research Impact Analysis	28_blockchain_thing_internet_iot	[‘blockchain’, ‘thing’, ‘internet’, ‘iot’, ‘5g’, ‘cryptocurrency’, ‘system’, ‘mobile’, ‘bitcoin’, ‘computing’, ‘machine’, ‘intrusion’, ‘security’, ‘detection’, ‘cloud’, ‘price’, ‘network’, ‘triangulation’, ‘autonomic’, ‘trusted’, ‘cryptocurrencies’, ‘dispute’, ‘prisma’, ‘challenge’, ‘ict’, ‘anomaly’, ‘resolution’, ‘environment’, ‘alternative’, ‘criterion’, ‘ml’, ‘paradigm’, ‘edge’, ‘dirichlet’, ‘allocation’, ‘bibliometric’, ‘making’, ‘latent’, ‘learning’, ‘future’, ‘direction’, ‘industrial’, ‘scientometric’, ‘analysis’, ‘research’, ‘domain’, ‘data’, ‘application’, ‘control’, ‘use’, ‘id’, ‘resourcemanagement’, ‘checking’, ‘tapr’, ‘prisma2020’, ‘enclave’, ‘confidentiality’, ‘mimo’, ‘contract’, ‘litigation’, ‘cyberphysical’, ‘currency’, ‘malpractice’, ‘ao’, ‘south’, ‘aquila’, ‘line’, ‘supplychain’, ‘hai’, ‘hedge’, ‘haven’, ‘1981’, ‘6g’, ‘unleashing’, ‘adr’, ‘smart’, ‘publication’, ‘intelligence’, ‘visualization’, ‘ai’, ‘embedded’, ‘contemporary’, ‘banking’, ‘volatility’, ‘multi’, ‘prominent’, ‘sdn’, ‘egovernance’, ‘korea’, ‘decision’, ‘analytics’, ‘operational’, ‘bibliometry’, ‘stacked’, ‘safe’, ‘mediation’, ‘bidirectional’, ‘advance’, ‘massive’, ‘agent’]	[‘iot’, ‘ict’, ‘scientometric’, ‘ai’, ‘blockchain’, ‘publication’, ‘bibliometric’, ‘ml’, ‘cyberphysical’, ‘smart’]	[‘blockchain’, ‘5g’, ‘cryptocurrency’, ‘bitcoin’, ‘ict’, ‘latent’, ‘scientometric’, ‘id’, ‘prisma2020’, ‘smart’]
	Impact of Active Learning and Educational Innovations on Student Engagement and Learning Outcomes	32	12	Flipped Classroom and Active Learning in Higher Education	32_flipped_active_student_classroom	[‘flipped’, ‘active’, ‘student’, ‘classroom’, ‘engagement’, ‘learning’, ‘curator’, ‘fashion’, ‘exhibition’, ‘multinational’, ‘metal’, ‘museum’, ‘continuous’, ‘interdisciplinary’, ‘lotkas’, ‘collaboration’, ‘blended’, ‘response’, ‘programming’, ‘implementation’, ‘education’, ‘law’, ‘program’, ‘engineering’, ‘college’, ‘undergraduate’, ‘automated’, ‘experience’, ‘review’, ‘border’, ‘summer’, ‘iated’, ‘cocurricular’, ‘hispanic’, ‘hsi’, ‘heavy’, ‘serving’, ‘confchem’, ‘millenia’, ‘gerontology’, ‘crossdisciplinary’, ‘publishable’, ‘experiential’, ‘extramural’, ‘nursingeducation’, ‘fermentation’, ‘20112015’, ‘asking’, ‘science’, ‘institution’, ‘20002015’, ‘aging’, ‘nigeria’, ‘learned’, ‘printing’, ‘engaging’, ‘guided’, ‘framework’, ‘strategy’, ‘community’, ‘successful’, ‘congress’, ‘hivaids’, ‘higher’, ‘assessment’, ‘tool’, ‘project’, ‘university’, ‘studying’, ‘exposure’, ‘technology’, ‘pathway’, ‘teaching’, ‘paper’, ‘bibliometrics’, ‘second’, ‘integrating’, ‘highquality’, ‘3d’, ‘graduate’, ‘device’, ‘investigation’, ‘competency’, ‘designbased’, ‘cooccurrence’, ‘highereducation’, ‘proposal’, ‘stem’, ‘design’, ‘current’, ‘bibliometric’, ‘common’, ‘research’, ‘biology’, ‘funding’, ‘characteristic’, ‘conference’, ‘subject’, ‘mathematics’, ‘activity’]	[‘bibliometric’, ‘education’, ‘bibliometrics’, ‘teaching’, ‘nursingeducation’, ‘institution’, ‘studying’, ‘learned’, ‘learning’, ‘student’]	[‘flipped’, ‘active’, ‘curator’, ‘collaboration’, ‘education’, ‘iated’, ‘cocurricular’, ‘experiential’, ‘nursingeducation’, ‘nigeria’]
	Impact and Evaluation of Online Learning through Bibliometric and Scientometric Analysis	7	47	Machine Learning and Artificial Intelligence in Higher Education	7_online_education_learning_student	[‘online’, ‘education’, ‘learning’, ‘student’, ‘distance’, ‘higher’, ‘elearning’, ‘open’, ‘covid19’, ‘mooc’, ‘engagement’, ‘moocs’, ‘access’, ‘educational’, ‘course’, ‘performance’, ‘bibliometric’, ‘pandemic’, ‘community’, ‘cognitive’, ‘motivation’, ‘theory’, ‘school’, ‘acceptance’, ‘blended’, ‘belief’, ‘web’, ‘science’, ‘analysis’, ‘trend’, ‘technology’, ‘inquiry’, ‘impact’, ‘collaborative’, ‘medium’, ‘mathematics’, ‘participation’, ‘input’, ‘graduate’, ‘growth’, ‘teacher’, ‘stem’, ‘university’, ‘study’, ‘mode’, ‘social’, ‘research’, ‘success’, ‘model’, ‘epistemological’, ‘microcredentials’, ‘planned’, ‘facetoface’, ‘mindfulness’, ‘fun’, ‘challenge’, ‘informationretrieval’, ‘retention’, ‘affair’, ‘outreach’, ‘instructor’, ‘lecture’, ‘factor’, ‘curriculum’, ‘epistemic’, ‘citnetexplorer’, ‘massive’, ‘selfefficacy’, ‘mind’, ‘gamification’, ‘scopus’, ‘vosviewer’, ‘k12’, ‘presence’, ‘general’, ‘collaboration’, ‘outcome’, ‘mixed’, ‘stress’, ‘evidencebased’, ‘learner’, ‘knn’, ‘perspective’, ‘institution’, ‘facebook’, ‘scholarship’, ‘application’, ‘doctoral’, ‘choice’, ‘medical’, ‘20’, ‘intelligent’, ‘framework’, ‘competition’, ‘video’, ‘publication’, ‘agenda’, ‘change’, ‘critical’, ‘transformation’]	[‘moocs’, ‘mooc’, ‘elearning’, ‘bibliometric’, ‘online’, ‘education’, ‘educational’, ‘scholarship’, ‘institution’, ‘course’]	[‘online’, ‘education’, ‘elearning’, ‘mooc’, ‘engagement’, ‘moocs’, ‘bibliometric’, ‘success’, ‘factor’, ‘gamification’]
	Impact of Active Learning and Assessment Strategies on Educational Outcomes	19	22	Academic Curriculum Development and Assessment Using Machine Learning and Artificial Intelligence	19_active_curriculum_teaching_learning	[‘active’, ‘curriculum’, ‘teaching’, ‘learning’, ‘assessment’, ‘student’, ‘course’, ‘method’, ‘education’, ‘medicine’, ‘biology’, ‘molecular’, ‘strategy’, ‘astronomy’, ‘undergraduate’, ‘casebased’, ‘prehealth’, ‘research’, ‘orientation’, ‘exam’, ‘enzyme’, ‘integration’, ‘formative’, ‘laboratory’, ‘implementation’, ‘simulation’, ‘meme’, ‘disciplinebased’, ‘jenos’, ‘planetarium’, ‘biotechnology’, ‘drama’, ‘multiplechoice’, ‘teachingresearch’, ‘cbl’, ‘metacognitive’, ‘activity’, ‘professional’, ‘development’, ‘catalysis’, ‘evaluation’, ‘behavioral’, ‘postgraduate’, ‘mentoring’, ‘module’, ‘translational’, ‘expression’, ‘exercise’, ‘educational’, ‘pedagogy’, ‘improving’, ‘inquiry’, ‘group’, ‘thinking’, ‘critical’, ‘program’, ‘career’, ‘mobile’, ‘question’, ‘study’, ‘resource’, ‘using’, ‘skill’, ‘work’, ‘teacher’, ‘methodology’, ‘experience’, ‘design’, ‘economics’, ‘introductory’, ‘jena’, ‘juhspecific’, ‘parcel’, ‘outbreak’, ‘interestoriented’, ‘persuasive’, ‘promoting’, ‘workforce’, ‘medicaleducation’, ‘constructive’, ‘compensation’, ‘constructivism’, ‘interactivity’, ‘master’, ‘flexner’, ‘fom’, ‘king’, ‘kom’, ‘fourfactor’, ‘glycobiology’, ‘play’, ‘playing’, ‘lecturer’, ‘mme’, ‘misconception’, ‘lecturetutorials’, ‘socialwork’, ‘broad’, ‘coteaching’, ‘softskill’]	[‘teachingresearch’, ‘curriculum’, ‘education’, ‘pedagogy’, ‘teaching’, ‘medicaleducation’, ‘educational’, ‘prehealth’, ‘learning’, ‘coteaching’]	[‘active’, ‘curriculum’, ‘teaching’, ‘prehealth’, ‘exam’, ‘teachingresearch’, ‘pedagogy’, ‘medicaleducation’, ‘lecturetutorials’, ‘coteaching’]
	Impact of Online and Digital Learning Innovations on Professional Development and Educational Outcomes22	21	19	Online Learning and Health Professional Education	21_online_health_course_action	[‘online’, ‘health’, ‘course’, ‘action’, ‘learning’, ‘assessment’, ‘education’, ‘care’, ‘virtual’, ‘student’, ‘training’, ‘cancer’, ‘pain’, ‘selfassessment’, ‘professional’, ‘pedagogical’, ‘skill’, ‘nurse’, ‘development’, ‘participatory’, ‘research’, ‘team’, ‘crossboundary’, ‘collaborate’, ‘counselor’, ‘hepatitis’, ‘guinea’, ‘palliative’, ‘discussion’, ‘socioconstructivist’, ‘shortage’, ‘school’, ‘collaboration’, ‘curriculum’, ‘work’, ‘asynchronous’, ‘experience’, ‘worker’, ‘teaching’, ‘distance’, ‘formative’, ‘load’, ‘qualitative’, ‘competency’, ‘blended’, ‘primary’, ‘designbased’, ‘nursing’, ‘openaccess’, ‘medical’, ‘psychology’, ‘creation’, ‘evaluation’, ‘mooc’, ‘higher’, ‘program’, ‘study’, ‘elearning’, ‘practice’, ‘classroom’, ‘disease’, ‘teacher’, ‘design’, ‘tribal’, ‘bl’, ‘encouraging’, ‘culturally’, ‘uganda’, ‘ol’, ‘patientcentred’, ‘pbl’, ‘educationprofessional’, ‘alaska’, ‘aidespractitioners’, ‘counselorsintraining’, ‘app’, ‘technologist’, ‘quasiexperimental’, ‘kenya’, ‘strengthen’, ‘spaced’, ‘share’, ‘provider’, ‘problembased’, ‘inform’, ‘inpatient’, ‘investigative’, ‘implementing’, ‘teachingintensive’, ‘supervisory’, ‘supervision’, ‘indigenous’, ‘struggling’, ‘hp’, ‘3429’, ‘juggling’, ‘counseling’, ‘continuing’, ‘conakry’, ‘communityhealth’]	[‘mooc’, ‘elearning’, ‘online’, ‘virtual’, ‘education’, ‘counselorsintraining’, ‘qualitative’, ‘teaching’, ‘educationprofessional’, ‘evaluation’]	[‘virtual’, ‘participatory’, ‘qualitative’, ‘nursing’, ‘mooc’, ‘elearning’, ‘educationprofessional’, ‘counselorsintraining’, ‘teachingintensive’, ‘communityhealth’]

Figure A1. Topic modelling visualisation. (a) Documents and topics: Scatter plot showing document clusters across 35 topics, highlighting areas like citation analysis, altmetrics, and AI-driven research. (b) Similarity matrix: A heatmap illustrating topic overlap. Stronger connections are seen in bibliometrics and citation analysis, with less overlap in niche topics like AI in education. (c) Topic word scores: Key terms for each topic, such as ‘modelling’ for bibliometrics and ‘altmetrics’ for social media metrics, represented in bar charts. (d) Inter-topic distance map: Topic relationships. Thematically similar topics cluster closely, particularly in bibliometrics and ML applications.

References

Pinto, T.; Teixeira, A.A.C. The impact of research output on economic growth by fields of science: A dynamic panel data analysis, 1980–2016. Scientometrics 2020, 123, 945–978. [Google Scholar] [CrossRef]
Penfield, T.; Baker, M.J.; Scoble, R.; Wykes, M.C. Assessment, evaluations, and definitions of research impact: A review. Res. Eval. 2014, 23, 21–32. [Google Scholar] [CrossRef]
Bugajska, M. “KT” CarePacks—Collaboration Patterns for Knowledge Transfer: Learning from IS/IT-Outsourcing Case at a Swiss Financial Institution. In Proceedings of the Knowledge Management In Action, Milan, Italy, 7–10 September 2008; pp. 17–36. [Google Scholar]
Konkiel, S.; Guichard, S. Altmetrics: “big data” that map the influence of New Zealand research. Libr. Hi Tech News 2018, 35, 1–5. [Google Scholar] [CrossRef]
Abrishami, A.; Aliakbary, S. Predicting citation counts based on deep neural network learning techniques. J. Informetr. 2019, 13, 485–499. [Google Scholar] [CrossRef]
Egger, R.; Yu, J. A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef]
Bornmann, L.; Haunschild, R. Which people use which scientific papers? An evaluation of data from F1000 and Mendeley. J. Informetr. 2015, 9, 477–487. [Google Scholar] [CrossRef]
Hirsch, J.E. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA 2005, 102, 16569–16572. [Google Scholar] [CrossRef]
Zahedi, Z.; Costas, R.; Wouters, P. How well developed are altmetrics? A cross-disciplinary analysis of the presence of ‘alternative metrics’ in scientific publications. Scientometrics 2014, 101, 1491–1513. [Google Scholar] [CrossRef]
Haustein, S.; Bowman, T.D.; Costas, R. Interpreting “altmetrics”: Viewing acts on social media through the lens of citation and social theories. In Theories of Informetrics and Scholarly Communication; Sugimoto, C.R., Ed.; De Gruyter Saur: Berlin, Gamany; Boston, MA, USA, 2016; pp. 372–406. [Google Scholar]
Erdt, M.; Nagarajan, A.; Sin, S.-C.J.; Theng, Y.-L. Altmetrics: An analysis of the state-of-the-art in measuring research impact on social media. Scientometrics 2016, 109, 1117–1166. [Google Scholar] [CrossRef]
Bornmann, L.; Marx, W. Methods for the generation of normalized citation impact scores in bibliometrics: Which method best reflects the judgements of experts? J. Informetr. 2015, 9, 408–418. [Google Scholar] [CrossRef]
Priem, J.; Taraborelli, D.; Groth, P.; Neylon, C. Altmetrics: A Manifesto; DigitalCommons@University of Nebraska-Lincoln: Lincoln, NE, USA, 2011. [Google Scholar]
Bar-Ilan, J.; Haustein, S.; Peters, I.; Priem, J.; Shema, H.; Terliesner, J. Beyond Citations: Scholars’ Visibility on the Social Web. arXiv 2012, arXiv:1205.5611. Available online: https://arxiv.org/abs/1205.5611 (accessed on 21 March 2025).
Bornmann, L. Alternative metrics in scientometrics: A meta-analysis of research into three altmetrics. Scientometrics 2015, 103, 1123–1144. [Google Scholar] [CrossRef]
Han, H.; Zha, H.; Giles, C.L. Name disambiguation in author citations using a K-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries; Denver, CO, USA, 7–11 June 2005, pp. 334–343.
Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. Available online: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf (accessed on 21 March 2025).
Turan, S.C.; Yildiz, K.; Büyüktanir, B. Comparison of LDA, NMF and BERTopic Topic Modeling Techniques on Amazon Product Review Dataset: A Case Study. In Proceedings of the Computing, Internet of Things and Data Analytics, Paris, France, 27–29 September 2024; pp. 23–31. [Google Scholar]
Zhang, J.; Wolfram, D.; Ma, F. The impact of big data on research methods in information science. Data Inf. Manag. 2023, 7, 100038. [Google Scholar] [CrossRef]
Tosi, D.; Kokaj, R.; Roccetti, M. 15 years of Big Data: A systematic literature review. J. Big Data 2024, 11, 73. [Google Scholar] [CrossRef]
Wu, M.; Zhang, Y.; Li, X. Exploring Associations within Disease-Gene Pairs: Bibliometrics, Word Embedding, and Network Analytics. In Proceedings of the 2022 Portland International Conference on Management of Engineering and Technology (PICMET), Portland, OR, USA, 7–11 August 2022; pp. 1–7. [Google Scholar]
Zuo, Z.; Zhao, K. Understanding and predicting future research impact at different career stages—A social network perspective. J. Assoc. Inf. Sci. Technol. 2021, 72, 454–472. [Google Scholar] [CrossRef]
Vital, A.; Amancio, D.R. A comparative analysis of local similarity metrics and machine learning approaches: Application to link prediction in author citation networks. Scientometrics 2022, 127, 6011–6028. [Google Scholar] [CrossRef]
Zhang, Y.; Tan, Y.-t.; Wang, M.-j.; Li, L.; Huang, J.-f.; Wang, S.-c. Bibliometric analysis of PTEN in neurodevelopment and neurodegeneration. Front. Aging Neurosci. 2024, 16, 1390324. [Google Scholar] [CrossRef]
Zhao, S.-H.; Ji, X.-Y.; Yuan, G.-Z.; Cheng, T.; Liang, H.-Y.; Liu, S.-Q.; Yang, F.-Y.; Tang, Y.; Shi, S. A Bibliometric Analysis of the Spatial Transcriptomics Literature from 2006 to 2023. Cell. Mol. Neurobiol. 2024, 44, 50. [Google Scholar] [CrossRef]
Zhou, Y.; Wu, T.; Sun, J.; Bi, H.; Xiao, Y.; Wang, H. Mapping the landscape and exploring trends in macrophage-related research within non-small cell lung cancer: A comprehensive bibliometric analysis. Front. Immunol. 2024, 15, 1398166. [Google Scholar] [CrossRef]
Zhang, Y.; Li, J.; Wang, X.; Yang, Y.; Zhou, Z.; Deng, X.; Gao, Y.; Wang, P. A bibliometric analysis review of the Pennisetum (1970–2023). Front. Sustain. Food Syst. 2024, 8, 1405684. [Google Scholar] [CrossRef]
Zhu, X.; Zhou, Z.; Pan, X. Research reviews and prospects of gut microbiota in liver cirrhosis: A bibliometric analysis (2001–2023). Front. Microbiol. 2024, 15, 1342356. [Google Scholar] [CrossRef]
Żywiec, J.; Szpak, D.; Wartalska, K.; Grzegorzek, M. The Impact of Climate Change on the Failure of Water Supply Infrastructure: A Bibliometric Analysis of the Current State of Knowledge. Water 2024, 16, 1043. [Google Scholar] [CrossRef]
Zhang, Y.; Shen, G.Q.; Xue, J. A Bibliometric Analysis of Supply Chain Management within Modular Integrated Construction in Complex Project Management. Buildings 2024, 14, 1667. [Google Scholar] [CrossRef]
Zhao, P.; Wang, Y.; Xu, Z.; Chang, X.; Zhang, Y. Research progress of freeze–thaw rock using bibliometric analysis. Open Geosci. 2024, 16. [Google Scholar] [CrossRef]
van Eck, N.J.; Waltman, L. Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef] [PubMed]
Purnell, P.J. Bibliometrics in the Context of Research Evaluation and Research Policy; Leiden University: Leiden, The Netherlands, 2025. [Google Scholar]
Rosli, A.; Hassan, S.; Omar, M.H. Bibliometric Analysis of Trust in Named Data Networking: Insights and Future Directions. J. Adv. Res. Appl. Sci. Eng. Technol. 2024, 48, 269–282. [Google Scholar] [CrossRef]
Gana, B.; Leiva-Araos, A.; Allende-Cid, H.; García, J. Leveraging LLMs for Efficient Topic Reviews. Appl. Sci. 2024, 14, 7675. [Google Scholar] [CrossRef]
Deerwester, S.; Dumais, S.T.; Furnas, G.W.; Landauer, T.K.; Harshman, R. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 1990, 41, 391–407. [Google Scholar] [CrossRef]
Nugumanova, A.; Alzhanov, A.; Mansurova, A.; Rakhymbek, K.; Baiburin, Y. Semantic Non-Negative Matrix Factorization for Term Extraction. Big Data Cogn. Comput. 2024, 8, 72. [Google Scholar] [CrossRef]
Medvecki, D.; Bašaragin, B.; Ljajić, A.; Milošević, N. Multilingual Transformer and BERTopic for Short Text Topic Modeling: The Case of Serbian. In Proceedings of the 13th International Conference on Information Society and Technology (ICIST), Kopaonik, Serbia, 12–15 March 2024; pp. 161–173. [Google Scholar]
Bala, I.; Mitchell, L. Thematic exploration of educational research after the COVID pandemic through topic modelling. J. Appl. Learn. Teach. 2024, 7, 26. [Google Scholar] [CrossRef]
Williams, C.Y.K.; Li, R.X.; Luo, M.Y.; Bance, M. Exploring patient experiences and concerns in the online Cochlear implant community: A cross-sectional study and validation of automated topic modelling. Clin. Otolaryngol. 2023, 48, 442–450. [Google Scholar] [CrossRef] [PubMed]
Varavallo, G.; Scarpetti, G.; Barbera, F. The moral economy of the great resignation. Humanit. Soc. Sci. Commun. 2023, 10, 587. [Google Scholar] [CrossRef]
Alieva, I.; Kloo, I.; Carley, K.M. Analyzing Russia’s propaganda tactics on Twitter using mixed methods network analysis and natural language processing: A case study of the 2022 invasion of Ukraine. EPJ Data Sci. 2024, 13, 42. [Google Scholar] [CrossRef]
Dormosh, N.; Abu-Hanna, A.; Calixto, I.; Schut, M.C.; Heymans, M.W.; van der Velde, N. Topic evolution before fall incidents in new fallers through natural language processing of general practitioners’ clinical notes. Age Ageing 2024, 53, afae016. [Google Scholar] [CrossRef] [PubMed]
Kastrati, Z.; Imran, A.S.; Daudpota, S.M.; Memon, M.A.; Kastrati, M. Soaring Energy Prices: Understanding Public Engagement on Twitter Using Sentiment Analysis and Topic Modeling With Transformers. IEEE Access 2023, 11, 26541–26553. [Google Scholar] [CrossRef]
Fahimnia, B.; Sarkis, J.; Davarzani, H. Green supply chain management: A review and bibliometric analysis. Int. J. Prod. Econ. 2015, 162, 101–114. [Google Scholar] [CrossRef]
Falagas, M.E.; Pitsouni, E.I.; Malietzis, G.A.; Pappas, G. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and weaknesses. FASEB J. 2008, 22, 338–342. [Google Scholar] [CrossRef]
Boyack, K.W.; Klavans, R.; Börner, K. Mapping the backbone of science. Scientometrics 2005, 64, 351–374. [Google Scholar] [CrossRef]
Harder, V.S.; Stuart, E.A.; Anthony, J.C. Propensity Score Techniques and the Assessment of Measured Covariate Balance to Test Causal Associations in Psychological Research. Psychol. Methods 2010, 15, 234–249. [Google Scholar] [CrossRef]
Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.L.; Marcus, D.; Colen, R.R.; et al. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef]
Tang, J.; Sun, J.M.; Wang, C.; Yang, Z. Social Influence Analysis in Large-scale Networks. In Proceedings of the Kdd-09: 15th ACM Sigkdd Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 807–815. [Google Scholar]
Bollen, J.; Van de Sompel, H.; Hagberg, A.; Chute, R. A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE 2009, 4, e6022. [Google Scholar] [CrossRef]
Li, K.; Rollins, J.; Yan, E. Web of Science use in published research and review papers 1997-2017: A selective, dynamic, cross-domain, content-based analysis. Scientometrics 2018, 115, 1–20. [Google Scholar] [CrossRef] [PubMed]
Kumar, S.; Lim, W.M.; Sivarajah, U.; Kaur, J. Artificial Intelligence and Blockchain Integration in Business: Trends from a Bibliometric-Content Analysis. Inf. Syst. Front. 2023, 25, 871–896. [Google Scholar] [CrossRef] [PubMed]
Wamba, S.F.; Queiroz, M.M. Responsible Artificial Intelligence as a Secret Ingredient for Digital Health: Bibliometric Analysis, Insights, and Research Directions. Inf. Syst. Front. 2023, 25, 2123–2138. [Google Scholar] [CrossRef]
Ezugwu, A.E.; Shukla, A.K.; Nath, R.; Akinyelu, A.A.; Agushaka, J.O.; Chiroma, H.; Muhuri, P.K. Metaheuristics: A comprehensive overview and classification along with bibliometric analysis. Artif. Intell. Rev. 2021, 54, 4237–4316. [Google Scholar] [CrossRef]
Golowko, N.; Tamla, P.; Stein, H.; Böhm, T.; Hemmje, M.; Onete, C.B. On the Trail of Future Management Topics with Digital Technology—How Can Artificial Intelligence Influence the Didactic Content of Higher Education in Economics? Education Excellence and Innovation Management Through Vision 2020. 2019, pp. 8145–8155. Available online: https://ibima.org/accepted-paper/on-the-trail-of-future-management-topics-with-digital-technology-how-can-artificial-intelligence-influence-the-didactic-content-of-higher-education-in-economics/ (accessed on 21 March 2025).
Lopez-Martinez, R.E.; Sierra, G. Research Trends in the International Literature on Natural Language Processing, 2000–2019—A Bibliometric Study. J. Scientometr. Res. 2020, 9, 310–318. [Google Scholar] [CrossRef]
Chen, X.L.; Zou, D.; Xie, H.R.; Cheng, G.; Liu, C.X. Two Decades of Artificial Intelligence in Education: Contributors, Collaborations, Research Topics, Challenges, and Future Directions. Educ. Technol. Soc. 2022, 25, 28–47. [Google Scholar]
Huang, X.Y.; Zou, D.; Cheng, G.; Chen, X.L.; Xie, H.R. Trends, Research Issues and Applications of Artificial Intelligence in Language Education. Educ. Technol. Soc. 2023, 26, 112–131. [Google Scholar] [CrossRef]
Lodge, J.M.; Thompson, K.; Corrin, L. Mapping out a research agenda for generative artificial intelligence in tertiary education. Australas. J. Educ. Technol. 2023, 39, 18. [Google Scholar] [CrossRef]
Kartal, G.; Yesilyurt, Y.E. A bibliometric analysis of artificial intelligence in L2 teaching and applied linguistics between 1995 and 2022. ReCALL 2024, 36, 359–375. [Google Scholar] [CrossRef]
Guo, S.C.; Zheng, Y.Y.; Zhai, X.M. Artificial intelligence in education research during 2013–2023: A review based on bibliometric analysis. Educ. Inf. Technol. 2024, 29, 16387–16409. [Google Scholar] [CrossRef]
Van Eck, N.J.; Waltman, L. Bibliometric mapping of the computational intelligence field. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2007, 15, 625–645. [Google Scholar] [CrossRef]
Bhattacharya, S. Some Salient Aspects of Machine Learning Research: A Bibliometric Analysis. J. Scientometr. Res. 2019, 8, S85–S92. [Google Scholar] [CrossRef]
García-Sánchez, P.; Mora, A.M.; Castillo, P.A.; Pérez, I.J. A bibliometric study of the research area of videogames using Dimensions.ai database. In Proceedings of the 7th International Conference on Information Technology and Quantitative Management (ITQM 2019): Information Technology and Quantitative Management Based on Artificial Intelligence, Granada, Spain, 3–6 November 2019; Volume 162, pp. 737–744. [Google Scholar] [CrossRef]
Taskin, Z.; Al, U. Natural language processing applications in library and information science. Online Inf. Rev. 2019, 43, 676–690. [Google Scholar] [CrossRef]
Akrami, N.E.; Hanine, M.; Flores, E.S.; Aray, D.G.; Ashraf, I. Unleashing the Potential of Blockchain and Machine Learning: Insights and Emerging Trends From Bibliometric Analysis. IEEE Access 2023, 11, 78879–78903. [Google Scholar] [CrossRef]
Onan, A. Two-Stage Topic Extraction Model for Bibliometric Data Analysis Based on Word Embeddings and Clustering. IEEE Access 2019, 7, 145614–145633. [Google Scholar] [CrossRef]
Huang, L.; Cai, Y.J.; Zhao, E.D.; Zhang, S.T.; Shu, Y.; Fan, J. Measuring the interdisciplinarity of Information and Library Science interactions using citation analysis and semantic analysis. Scientometrics 2022, 127, 6733–6761. [Google Scholar] [CrossRef]
Zhang, Y.; Lu, J.; Liu, F.; Liu, Q.; Porter, A.; Chen, H.S.; Zhang, G.Q. Does deep learning help topic extraction? A kernel k-means clustering method with word embedding. J. Informetr. 2018, 12, 1099–1117. [Google Scholar] [CrossRef]
Dalavi, A.M.; Gomes, A.; Husain, A.J. Bibliometric analysis of nature inspired optimization techniques. Comput. Ind. Eng. 2022, 169. [Google Scholar] [CrossRef]
Barbieri, N.; Bonchi, F.; Manco, G. Topic-aware social influence propagation models. Knowl. Inf. Syst. 2013, 37, 555–584. [Google Scholar] [CrossRef]
Egghe, L. Theory and practise of the g-index. Scientometrics 2006, 69, 131–152. [Google Scholar]
Ajzen, I. The theory of planned behavior. Organ. Behav. Hum. Decis. Process. 1991, 50, 179–211. [Google Scholar] [CrossRef]
Bastian, M.; Heymann, S.; Jacomy, M. Gephi: An open source software for exploring and manipulating networks. In Proceedings of the International AAAI Conference on Web and Social Media, San Jose, CA, USA, 17–20 May 2009; pp. 361–362. [Google Scholar]
Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W.M. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
Abramo, G.; D’Angelo, C.A.; Felici, G. Predicting publication long-term impact through a combination of early citations and journal impact factor. J. Informetr. 2019, 13, 32–49. [Google Scholar] [CrossRef]
Aksnes, D.W. Characteristics of highly cited papers. Res. Eval. 2003, 12, 159–170. [Google Scholar] [CrossRef]
Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Informetr. 2017, 11, 959–975. [Google Scholar] [CrossRef]
Griffiths, T.L.; Steyvers, M. Finding scientific topics. Proc. Natl. Acad. Sci. USA 2004, 101, 5228–5235. [Google Scholar] [CrossRef] [PubMed]
Bornmann, L.; Mutz, R. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 2015, 66, 2215–2222. [Google Scholar] [CrossRef]
Boyack, K.W.; Newman, D.; Duhon, R.J.; Klavans, R.; Patek, M.; Biberstine, J.R.; Schijvenaars, B.; Skupin, A.; Ma, N.; Börner, K. Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE 2011, 6, e18029. [Google Scholar] [CrossRef]
Teufel, S.; Siddharthan, A.; Tidhar, D. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 22–23 July 2006; pp. 103–110. [Google Scholar]
Bonzi, S. Characteristics of a literature as predictors of relatedness between cited and citing works. J. Am. Soc. Inf. Sci. 1982, 33, 208–216. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Iqbal, S.; Saeed-Ul, H.; Aljohani, N.R.; Alelyani, S.; Nawaz, R.; Bornmann, L. A decade of in-text citation analysis based on natural language processing and machine learning techniques: An overview of empirical studies. Scientometrics 2021, 126, 6551–6599. [Google Scholar] [CrossRef]
Kilicoglu, H.; Peng, Z.S.; Tafreshi, S.; Tran, T.; Rosemblat, G.; Schneider, J. Confirm or refute?: A comparative study on citation sentiment classification in clinical research publications. J. Biomed. Inform. 2019, 91, 103123. [Google Scholar] [CrossRef]
Zhu, X.D.; Turney, P.; Lemire, D.; Vellino, A. Measuring Academic Influence: Not All Citations Are Equal. J. Assoc. Inf. Sci. Technol. 2015, 66, 408–427. [Google Scholar] [CrossRef]
Huang, H.; Zhu, D.H.; Wang, X.F. Evaluating scientific impact of publications: Combining citation polarity and purpose. Scientometrics 2022, 127, 5257–5281. [Google Scholar] [CrossRef]
Jha, R.; Jbara, A.A.; Qazvinian, V.; Radev, D.R. NLP-driven citation analysis for scientometrics. Nat. Lang. Eng. 2017, 23, 93–130. [Google Scholar] [CrossRef]
Ihsan, I.; Qadir, M.A. CCRO: Citation’s Context & Reasons Ontology. IEEE Access 2019, 7, 30423–30436. [Google Scholar] [CrossRef]
Wulff, P.; Westphal, A.; Mientus, L.; Nowak, A.; Borowski, A. Enhancing writing analytics in science education research with machine learning and natural language processing-Formative assessment of science and non-science preservice teachers’ written reflections. Front. Educ. 2023, 7, 1061461. [Google Scholar] [CrossRef]
Su, X.; Prasad, A.; Kan, M.Y.; Sugiyama, K. Neural Multi-Task Learning for Citation Function and Provenance. In Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2019), Champaign, IL, USA, 2–6 June 2019; pp. 394–395. [Google Scholar] [CrossRef]
Gowanlock, M.; Gazan, R. Assessing researcher interdisciplinarity: A case study of the University of Hawaii NASA Astrobiology Institute. Scientometrics 2013, 94, 133–161. [Google Scholar] [CrossRef]
Basilio, M.P.; Pereira, V.; de Oliveira, M.; Neto, A.F.D.; de Moraes, O.C.R.; Siqueira, S.C.B. Knowledge discovery in research on domestic violence: An overview of the last fifty years. Data Technol. Appl. 2021, 55, 480–510. [Google Scholar] [CrossRef]
Maphosa, V.; Maphosa, M. Artificial intelligence in higher education: A bibliometric analysis and topic modeling approach. Appl. Artif. Intell. 2023, 37, 2261730. [Google Scholar] [CrossRef]
Chen, X.L.; Xie, H.R. A Structural Topic Modeling-Based Bibliometric Study of Sentiment Analysis Literature. Cogn. Comput. 2020, 12, 1097–1129. [Google Scholar] [CrossRef]
Goodell, J.W.; Kumar, S.; Li, X.; Pattnaik, D.; Sharma, A. Foundations and research clusters in investor attention: Evidence from bibliometric and topic modelling analysis. Int. Rev. Econ. Financ. 2022, 82, 511–529. [Google Scholar] [CrossRef]
Basilio, M.P.; Pereira, V.; de Oliveira, M. Knowledge discovery in research on policing strategies: An overview of the past fifty years. J. Model. Manag. 2022, 17, 1372–1409. [Google Scholar] [CrossRef]
Choi, S.; Seo, J. An Exploratory Study of the Research on Caregiver Depression: Using Bibliometrics and LDA Topic Modeling. Issues Ment. Health Nurs. 2020, 41, 592–601. [Google Scholar] [CrossRef]
Bittermann, A.; Fischer, A. How to Identify Hot Topics in Psychology Using Topic Modeling. Z. Psychol.-J. Psychol. 2018, 226, 3–13. [Google Scholar] [CrossRef]
Chen, H.S.; Jin, Q.Q.; Wang, X.M.; Xiong, F. Profiling academic-industrial collaborations in bibliometric-enhanced topic networks: A case study on digitalization research. Technol. Forecast. Soc. Change 2022, 175, 121402. [Google Scholar] [CrossRef]
Chen, H.; Deng, Z.J. Bibliometric Analysis of the Application of Convolutional Neural Network in Computer Vision. IEEE Access 2020, 8, 155417–155428. [Google Scholar] [CrossRef]
Sharma, A.; Koohang, A.; Rana, N.P.; Abed, S.S.; Dwivedi, Y.K. Journal of Computer Information Systems: Intellectual and Conceptual Structure. J. Comput. Inf. Syst. 2023, 63, 37–67. [Google Scholar] [CrossRef]
Ebadi, A.; Tremblay, S.; Goutte, C.; Schiffauerova, A. Application of machine learning techniques to assess the trends and alignment of the funded research output. J. Informetr. 2020, 14, 101018. [Google Scholar] [CrossRef]
Garfield, E. The history and meaning of the journal impact factor. JAMA 2006, 295, 90–93. [Google Scholar] [CrossRef]
Thelwall, M.; Haustein, S.; Larivière, V.; Sugimoto, C.R. Do altmetrics work? Twitter and ten other social web services. PLoS ONE 2013, 8, e64841. [Google Scholar] [CrossRef] [PubMed]
Fortunato, S.; Bergstrom, C.T.; Börner, K.; Evans, J.A.; Helbing, D.; Milojević, S.; Petersen, A.M.; Radicchi, F.; Sinatra, R.; Uzzi, B. Science of science. Science 2018, 359, eaao0185. [Google Scholar] [CrossRef] [PubMed]
Zupic, I.; Čater, T. Bibliometric methods in management and organization. Organ. Res. Methods 2015, 18, 429–472. [Google Scholar] [CrossRef]
Roberts, M.E.; Stewart, B.M.; Tingley, D. Stm: An R package for structural topic models. J. Stat. Softw. 2019, 91, 1–40. [Google Scholar] [CrossRef]
Van Eck, N.J.; Waltman, L. CitNetExplorer: A new software tool for analyzing and visualizing citation networks. J. Informetr. 2014, 8, 802–823. [Google Scholar] [CrossRef]
Ji, T.; Self, N.; Fu, K.; Chen, Z.; Ramakrishnan, N.; Lu, C.-T. Citation Forecasting with Multi-Context Attention-Aided Dependency Modeling. ACM Trans. Knowl. Discov. Data 2024, 18, 144–167. [Google Scholar] [CrossRef]
Abbas, K.; Hasan, M.K.; Abbasi, A.; Mokhtar, U.A.; Khan, A.; Abdullah, S.N.H.S.; Dong, S.; Islam, S.; Alboaneen, D.; Ahmed, F.R.A. Predicting the Future Popularity of Academic Publications Using Deep Learning by Considering It as Temporal Citation Networks. IEEE Access 2023, 11, 83052–83068. [Google Scholar] [CrossRef]
Largent, M.A.; Lane, J.I. STAR METRICS and the Science of Science Policy. Rev. Policy Res. 2012, 29, 431–438. [Google Scholar] [CrossRef]
Spaapen, J.; van Drooge, L. Introducing ‘productive interactions’ in social impact assessment. Res. Eval. 2011, 20, 211–218. [Google Scholar] [CrossRef]
Benneworth, P.; Cunha, J. Universities’ contributions to social innovation: Reflections in theory & practice. Eur. J. Innov. Manag. 2015, 18, 508–527. [Google Scholar] [CrossRef]
Reed, M.S.; Ferré, M.; Martin-Ortega, J.; Blanche, R.; Lawford-Rolfe, R.; Dallimer, M.; Holden, J. Evaluating impact from research: A methodological framework. Res. Policy 2021, 50, 104147. [Google Scholar] [CrossRef]
Du, W.; Li, Z.; Xie, Z. A modified LSTM network to predict the citation counts of papers. J. Inf. Sci. 2022, 50, 894–909. [Google Scholar] [CrossRef]
Maguire, S. Discourse and Adoption of Innovations: A Study of HIV/AIDS Treatments. Health Care Manag. Rev. 2002, 27, 74–88. [Google Scholar] [CrossRef] [PubMed]
Bornmann, L.; Marx, W. How good is research really? EMBO Rep. 2013, 14, 226–230. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.Z.; Srivastava, P.R.; Sharma, D.; Eachempati, P. Big data analytics and machine learning: A retrospective overview and bibliometric analysis. Expert Syst. Appl. 2021, 184, 115561. [Google Scholar] [CrossRef]

Figure 1. Methodological framework.

Figure 2. Annual scientific production.

Figure 3. Document citation network.

Figure 4. Document co-citation network.

Figure 5. Document bibliographic coupling network.

Figure 6. Top 10 references in terms of citation bursts—sorted by strength of burst [5,8,17,83,106,107,108,109,110,111].

Figure 7. Publication citation network.

Figure 8. Publication co-citation network.

Figure 9. Publication bibliographic coupling network.

Figure 10. The 20 most influential co-authorship networks based on component size.

Figure 11. Co-authorship collaboration network.

Figure 12. Author citation network.

Figure 13. Author co-citation network.

Figure 14. Author bibliographic coupling network.

Figure 15. Institutional co-authorship network.

Figure 16. Institutional citation network.

Figure 17. Institutional bibliographic coupling network.

Figure 18. Annual publication trends for the top five contributing countries in research impact science (2000–2024).

Figure 19. Country co-authorship network.

Figure 20. Country citation network.

Figure 21. Country bibliographic coupling network.

Figure 22. Author keyword co-occurrence.

Figure 23. Associated keyword co-occurrence.

Table 1. Top 20 sources and their citations.

Sources	Publications ¹	Contribution Share (%)	Total Citations ¹	H-Index ¹	Total Citations ²	H-Index ²
Scientometrics	104	6.47	1843	23	57,465	144
IEEE Access	53	3.3	607	11	916,390	242
Journal of Informetrics	31	1.93	498	11	17,095	90
Journal of Scientometric Research	16	1.0	37	4	471	8
Sustainability	14	0.87	126	6	624,123	169
Education and Information Technologies	12	0.75	124	5	26,127	76
Journal of Information Science	12	0.75	115	5	9911	77
Technological Forecasting and Social Change	12	0.75	281	8	114,767	179
IEEE Transactions on Engineering Management	11	0.68	52	4	14,772	112
Frontiers in Psychology	10	0.62	43	3	278,925	184
Quantitative Science Studies	10	0.62	92	6	3110	24
Expert Systems with Applications	8	0.5	107	6	273,480	271
Heliyon	8	0.5	34	1	111,478	88
Journal of Intelligent & Fuzzy Systems	8	0.5	56	5	50,258	82
PLoS ONE	8	0.5	268	4	2,920,828	435
Applied Sciences	7	0.44	40	3	-	-
Information Processing & Management	7	0.44	182	5	38,430	123
Multimedia Tools and Applications	7	0.44	23	3	109,404	106
Artificial Intelligence Review	6	0.37	139	2	28,988	115
Collnet Journal of Scientometrics and Information Management	6	0.37	33	4	-	-

¹ Local dataset; ² www.scimagojr.com, accessed on 17 October 2024.

Table 2. The 20 most productive authors.

Author	Publication Count ¹	Publication Fractionalized ¹	Productivity Contribution (%) ¹	Global Dataset ²
Author	Publication Count ¹	Publication Fractionalized ¹	Productivity Contribution (%) ¹	Total Citation	H-Index
Chen, Xieling	20	4.63	0.29	4391	30
Xie, Haoran	19	4.38	0.27	18,770	55
Zou, Di	13	3.03	0.19	7332	41
Hassan, Saeed-Ul	11	2.66	0.17	4116	37
Cheng, Gary	11	2.16	0.13	4792	33
Aljohani, Naif Radi	8	1.79	0.11	4825	38
Afzal, Muhammad Tanvir	7	2.08	0.13	1825	24
Mayr, Philipp	7	1.84	0.11	4198	26
Bornmann, Lutz	6	2.75	0.17	31,430	88
Glanzel, Wolfgang	6	2.33	0.15	27,548	89
Wang, Fu Lee	6	1.10	0.07	5402	40
Jung, Sukhwan	5	2.17	0.14	177	7
Thijs, Bart	5	2.00	0.12	4259	33
Thoma, George R.	5	1.83	0.11	-	-
Safder, Iqra	5	1.17	0.07	565	13
Saha, Snehanshu	5	1.09	0.07	2082	21
Xia, Feng	5	0.82	0.05	21,276	72
Nawaz, Raheel	5	0.75	0.05	5726	39
Amjad, Tehmina	4	1.42	0.09	1445	20
Ahmed, Sheraz	4	1.03	0.06	5533	36

¹ Web of Science bibliometric dataset; ² Google Scholar.

Table 3. The 20 most productive countries and their evolution.

Country	Publications	Share (%)
China	446	28.34
United States	240	15.25
India	108	6.86
Germany	65	4.13
United Kingdom	55	3.49
Spain	51	3.24
Australia	44	2.8
Canada	37	2.35
France	31	1.97
South Korea	31	1.97
Pakistan	30	1.91
Brazil	27	1.72
Italy	26	1.65
Netherlands	24	1.52
Russia	22	1.4
Japan	21	1.33
Iran	18	1.14
Romania	17	1.08
Singapore	17	1.08
Indonesia	14	0.89

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arsalan, M.H.; Mubin, O.; Al Mahmud, A.; Khan, I.A.; Hassan, A.J. Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence. Metrics 2025, 2, 5. https://doi.org/10.3390/metrics2020005

AMA Style

Arsalan MH, Mubin O, Al Mahmud A, Khan IA, Hassan AJ. Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence. Metrics. 2025; 2(2):5. https://doi.org/10.3390/metrics2020005

Chicago/Turabian Style

Arsalan, Mudassar Hassan, Omar Mubin, Abdullah Al Mahmud, Imran Ahmed Khan, and Ali Jan Hassan. 2025. "Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence" Metrics 2, no. 2: 5. https://doi.org/10.3390/metrics2020005

APA Style

Arsalan, M. H., Mubin, O., Al Mahmud, A., Khan, I. A., & Hassan, A. J. (2025). Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence. Metrics, 2(2), 5. https://doi.org/10.3390/metrics2020005

Article Menu

Mapping Data-Driven Research Impact Science: The Role of Machine Learning and Artificial Intelligence

Abstract

1. Introduction

2. Review of Literature

2.1. Advancing Research Impact Evaluation Through Data Science and Computational Methods

2.2. Bibliometric Analysis Overview

2.3. Topic Modelling for Thematic Analysis

3. Methodology

3.1. Data Collection and Preparation

3.1.1. Stage 1: Search Strategy

3.1.2. Stage 2: Results Screening and Refinement

3.2. Bibliometric Analysis

3.3. Topic Modelling

4. Results

4.1. Analysis of Published Documents

4.1.1. Document Citation Analysis

4.1.2. Document Co-Citation Analysis

4.1.3. Document Bibliographic Coupling Analysis

4.1.4. Document Citation Burst Analysis

4.2. Analysis of Publication Sources

4.2.1. Publication Citation Analysis

4.2.2. Publication Co-Citation Analysis

4.2.3. Publication Bibliographic Coupling Analysis

4.3. Analysis of Researchers

4.3.1. Researchers’ Co-Authorship Analysis

4.3.2. Researchers’ Citation Analysis

4.3.3. Researchers’ Co-Citation Analysis

4.3.4. Researchers’ Bibliographic Coupling Analysis

4.4. Analysis of Institutions

4.4.1. Institutions’ Co-Authorship Analysis

4.4.2. Institutions’ Citation Analysis

4.4.3. Institutions’ Bibliographic Coupling Analysis

4.5. Analysis of Countries

4.5.1. Countries’ Co-Authorship Analysis

4.5.2. Countries’ Citation Analysis

4.5.3. Countries’ Bibliographic Coupling Analysis

4.6. Analysis of Research

4.6.1. Authors’ Keyword Analysis

4.6.2. Associated Keyword Analysis

4.6.3. Research Topic Analysis

5. Discussion

5.1. Overview of the Key Findings

5.2. Prominent Methodologies and Emerging Research Themes in Machine Learning and Artificial Intelligence for Research Impact Assessment

5.3. Future Directions for Integrating Advanced Computational Techniques into Multidimensional Research Impact Evaluation

5.4. Implications and Limitations of the Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Topic Modelling Outcomes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI