1. Introduction
In recent years, interest in generative artificial intelligence has surged, driven largely by its ability to autonomously generate various types of content, including text, images, speech, and videos. Traditionally, these tasks were performed by humans, but the advent of AI has significantly transformed this landscape. The global interest in generative AI has been further fueled by the introduction of innovative, user-friendly services like ChatGPT, which have made these technologies more accessible to the public [
1]. ChatGPT, a large-scale AI language model, responds to questions with remarkable speed, accuracy, and quality, surpassing traditional chatbot services [
2].
As shown in
Figure 1, by August 2024, ChatGPT dominated the GAI chatbot market in the United States, holding a 60% market share, followed by the Microsoft Copilot program, also based on ChatGPT, at 14.6% [
3].
Recent studies, including Aljanabi’s, have demonstrated that ChatGPT is capable of emulating human-like discourse and providing insightful responses to complex queries [
4]. Trained on an extensive corpus of textual data, ChatGPT has gained a deep understanding of numerous subjects, enabling it to engage in coherent and contextually relevant discussions. One of its most remarkable features is its ability to address intricate inquiries. It can analyze nuanced questions, break them down into meaningful components, and generate structured responses that consider the context and subtleties of the conversation [
5].
As illustrated in
Figure 2, ChatGPT’s capabilities have led to its widespread use in various fields, including healthcare [
6,
7], business [
8], education [
9,
10], and content creation [
11]. Remarkably, ChatGPT reached one million users within just five days of its launch, a feat that took Netflix three-and-a-half years and Instagram two-and-a-half months to achieve [
12].
Despite the widespread adoption of ChatGPT, a debate continues regarding the potential benefits and challenges posed by GAI models in various industries [
13]. On the positive side, large-scale AI services like ChatGPT have the potential to enhance productivity in business processes and significantly impact industries such as healthcare, law, and media [
14]. For instance, in healthcare, ChatGPT can analyze patient data to suggest personalized treatment plans; in law, it can efficiently review vast volumes of legal documents; and in media, it can automatically generate content to provide timely information. The adoption of such AI technologies has the potential to reduce costs, improve efficiency, and enhance service delivery.
However, concerns remain about the risks associated with these technologies, particularly the potential for misinformation, misuse, and exploitation by malicious actors for cyberattacks and criminal activities [
15]. For example, ChatGPT could be used to develop malicious software, automate phishing attacks, or spread misinformation, leading to social disruptions. Consequently, it is crucial to uphold ethical standards and establish regulations to mitigate these risks [
14]. As GAI technologies continue to evolve rapidly, the discussion surrounding their safe integration into industries and the responses to these changes remains ongoing.
Therefore, further research is essential to explore the use of GAI models from a security perspective, ensuring that they can be safely and appropriately deployed across various industrial applications in the future. The objective of this study was to assess the potential security impact of using GAI models, such as ChatGPT, and to provide practical recommendations for their safer deployment in real-world applications. To achieve this, 1047 SCI-level academic articles on GAI and security were collected from the SCOPUS database and analyzed using scientometric methods. The goal of this analysis was to identify both the strengths and weaknesses of GAI, offering strategic insights into the opportunities and threats that it may pose to industrial security. Unlike conventional literature reviews, this study applies advanced text mining techniques to extract meaningful patterns from a large dataset, providing novel insights into GAI security.
The remainder of this paper is structured as follows:
Section 2 reviews related work and existing studies on LLM security.
Section 3 describes the research methodology, including data collection, refinement, and analysis methods. Specifically, it covers the research data collection process (
Section 3.1), preprocessing techniques, and analytical methods such as TF-IDF analysis, keyword centrality analysis, and LDA topic modeling (
Section 3.2).
Section 4 presents the results, including literature analysis, keyword-based analyses, and topic modeling outcomes. This section also details the coherence score measurement, topic classification, and topic weight distribution.
Section 5 discusses the key findings, implications, and potential limitations of this study. Finally,
Section 6 concludes the paper and suggests future research directions.
2. Related Work
Building upon the discussion in the
Section 1, it is essential to explore research trends related to GAI, particularly in the context of security. As GAI technologies, such as ChatGPT, are increasingly integrated into various industries, it is crucial to identify and address the potential issues arising from this integration. Despite the relatively recent emergence of GAI, a significant body of research has already been established, exploring not only the technical capabilities of these models but also their applications across various socio-economic sectors, including tourism, marketing, manufacturing, trade, and education [
4].
For instance, Rudolph et al. (2023) examined the challenges of value judgment and the application of GAI in higher education, emphasizing the complexities of integrating such advanced technologies into academic settings [
16]. Similarly, Bai-doo-Anu et al. (2023) analyzed the potential benefits of utilizing GAI in educational environments, highlighting its ability to enhance student engagement and enable personalized learning experiences [
17]. In the financial and business sectors, Wang et al. (2023) and Srivastava (2023) investigated how GAI could improve operational efficiency and support decision-making processes, demonstrating its profound impact on industry practices [
18,
19]. In healthcare, Kim (2025) utilized insights from ChatGPT to examine the transformative role of mobile health (mHealth) in managing obesity. The study identified trends such as personalized interventions, emerging technologies, remote monitoring, behavioral strategies, and user engagement. Additionally, it highlighted mHealth’s potential to improve equity, self-management, and evidence-based practices, while addressing challenges related to privacy and interdisciplinary collaboration [
20].
Despite the recognized potential of GAI models like ChatGPT, there is a growing discourse surrounding the ethical and legal challenges arising from their technological limitations. Ethical concerns include biases embedded in training datasets, which may result in skewed or unfair outcomes, as well as the risk of plagiarism due to inaccuracies in citations and sources [
21]. Furthermore, the use of AI-generated content raises broader questions about its authenticity and originality. On the legal front, discussions have centered around issues such as authorship recognition, intellectual property rights, and copyright infringements related to content generated by ChatGPT [
22].
In the educational context, significant concerns have emerged regarding academic dishonesty, particularly the potential for cheating during exams due to an over-reliance on AI-generated content [
23]. From an academic integrity standpoint, there are worries about the misuse of generative AI to produce fabricated data or entirely fictitious research papers, which could severely undermine the credibility and trustworthiness of scholarly work [
24].
This study aims to distinguish itself from previous research by employing advanced topic modeling techniques to analyze and visualize the key issues surrounding GAI and security. By uncovering deeper semantic structures within the existing body of research, this approach seeks to provide a more nuanced understanding of the potential opportunities and threats posed by the integration of GAI across industries, with a particular focus on security.
Building on the discussion of research trends related to GAI and the importance of analyzing these trends in the context of security, it is essential to explore methods that effectively quantify and evaluate the growing body of literature. While qualitative reviews provide valuable insights, they are limited in their ability to comprehensively assess large volumes of research. To address this, scientometric methods, such as text mining, have emerged as powerful quantitative tools for evaluating the significance of articles and authors in domains like cybersecurity and AI technology [
25]. These methods enhance the review process by offering a structured, data-driven approach to understanding the relationships between research papers through graphical representation. However, while text mining can identify patterns in research, it often lacks the granularity needed to capture the nuanced, topic-specific context of these studies [
26]. To bridge this gap, latent Dirichlet allocation (LDA) topic modeling has emerged as a more advanced technique for discovering and mining latent information within a research domain. LDA topic modeling utilizes statistical and optimization algorithms to extract semantic insights from large text corpora, making it a valuable tool for systematically analyzing textual data [
27]. This method has been widely applied across various fields, including scientific research trends, social networks, political discourse, biomedical recommendations, and public health initiatives [
25]. In technology and cybersecurity, LDA topic modeling is particularly useful for identifying patterns in cybersecurity incidents, studying public perceptions of privacy, and uncovering emerging trends in artificial intelligence research. Several recent studies have explored the security implications of large language models (LLMs) and generative AI. Zhou et al. (2023) conducted a broad review of adversarial attacks on LLMs, focusing on how AI-generated text can be manipulated for misinformation and phishing attacks [
18]. Wang and Li (2024) examined privacy risks associated with LLMs, including data leakage and user profiling concerns. Chen et al. (2023) analyzed LLMs’ robustness, assessing their resilience against adversarial prompts and bias exploitation [
18,
19]. Singh et al. (2023) reviewed regulatory and ethical frameworks surrounding LLM security, emphasizing policy recommendations rather than technical vulnerabilities [
23]. A significant body of research has also focused on the application of text mining, including LDA topic modeling, to provide overviews of scientific papers and uncover research topics. These studies can be broadly categorized into two groups [
27]: In the first category, text mining is employed to provide broad overviews across a wide range of research areas. For instance, Hall et al. used unsupervised text mining techniques to analyze historical trends in computational linguistics [
28], while Yau et al. applied LDA topic modeling to cluster scientific documents and evaluate their findings [
29]. The second category focuses on identifying trends within specific fields through text mining. For example, Întorsureanu et al. conducted a bibliometric analysis using text mining to evaluate academic publications on generative AI in education, which analyzed research trends, collaboration networks, sentiment analysis, and topic modeling. Their study identified the evolution of AI-related terms, a growing positive sentiment in academic discourse, and emerging research themes, such as personalized learning, AI-driven gamification, and hybrid human–AI classrooms [
30]. Similarly, Choi et al. used text mining to analyze document abstracts and identify trends in personal data privacy research [
31]. More recently, Kim et al. applied structural text network analysis to study articles on maritime autonomous surface ships (MASSs), identifying key trends in this rapidly evolving field [
32].
However, despite these contributions, existing studies on LLM security have primarily adopted qualitative, conceptual approaches, often lacking systematic mapping of the research landscape through scientometric methods. Many prior reviews have explored security challenges in LLMs, focusing on adversarial attacks, privacy risks, bias exploitation, and regulatory considerations. While these studies provide valuable insights into distinct security threats, they primarily synthesize existing knowledge without offering a quantitative, data-driven analysis of research trends. In contrast, this study employs text mining techniques—including TF-IDF, keyword centrality analysis, and LDA topic modeling—to systematically analyze 1047 peer-reviewed articles on LLM security. This approach enables the identification of major research clusters, thematic structures, and underexplored areas in LLM security research, offering a broader and more structured perspective. Furthermore, unlike prior reviews that focus on individual security risks, our study systematically maps the evolution of security-related discussions in LLM research and identifies interconnections between emerging security issues. By leveraging scientometric techniques, we aim to uncover hidden research patterns that are not apparent in qualitative reviews. In alignment with these advanced methodological approaches, this study utilizes natural language processing (NLP)-based text mining, including LDA topic modeling, to investigate key security topics in generative AI applications such as ChatGPT. Before conducting the text mining analysis, TF-IDF analysis and keyword centrality analysis were performed to identify major keywords, followed by LDA topic modeling to extract deeper thematic insights. The findings, derived from both intersubjective and independent perspectives, were analyzed in relation to the existing literature. The ultimate goal of this study is to provide a comprehensive, data-driven overview of current research trends, offer quantitative insights into the evolving security landscape, and suggest potential directions for future research on the safety of generative AI in industrial applications.
4. Results
4.1. Literature Analysis Results
4.1.1. Status by Country
The analysis of GAI research related to security was conducted by examining the distribution of publications across various countries.
Figure 5 presents the distribution of research publications by country or region, offering a global perspective on research activities in this field.
As shown in the figure, the United States leads the research output, with over 250 publications, reflecting its significant focus and investment in GAI and security. China follows closely with nearly 200 publications, demonstrating its substantial contribution to the field. India ranks third, with more than 100 documents, indicating a growing presence in GAI-related research. The United Kingdom, with a notable number of publications, ranks fourth globally.
Other countries, including Australia, Germany, Canada, and Italy, contribute moderately to the research landscape, with publication numbers ranging between 50 and 100 each. South Korea and Saudi Arabia, albeit with lower publication volumes, also participate in the global research efforts on GAI and security. This distribution underscores the international nature of GAI research, with significant contributions from both Western and Asian countries. It also highlights the widespread recognition of GAI’s importance in addressing security challenges in various regions.
4.1.2. Status by Institution
Further analysis of GAI research related to security was conducted by examining the contributions of academic and research institutions worldwide.
Figure 6 presents the numbers of documents published by leading institutions, offering insight into the primary contributors in the field.
The Chinese Academy of Sciences emerges as the most prolific institution, with over 20 publications, reflecting China’s strong emphasis on advancing GAI research, particularly in the context of security. It is closely followed by the Ministry of Education of the People’s Republic of China, underscoring the impact of national education policies and initiatives on driving research output. This demonstrates a coordinated national effort to prioritize GAI research.
The National University of Singapore is also a major contributor, reflecting Singapore’s strategic investment in technology and innovation. Tsinghua University and the University of the Chinese Academy of Sciences are other prominent Chinese institutions contributing significantly to the field, further reinforcing China’s leadership in GAI research.
Internationally, institutions such as the University of Illinois Urbana–Champaign, University College Dublin, and UNSW Sydney are key players, each contributing between 10 and 15 publications. These institutions reflect the global nature of GAI research, with active participation from leading universities in the United States, Europe, and Australia. Other institutions, including Nanjing University and Universiti Sains Malaysia, also make noteworthy contributions, demonstrating the diversity of actors involved in advancing GAI and security research.
This analysis highlights the dominance of Chinese institutions in GAI security research, while also recognizing the significant contributions of leading institutions from other regions. This global participation is vital for the continued development and application of GAI technologies to address security challenges.
4.2. Results of Keyword Frequency (TF-IDF) Analysis
A term frequency–inverse document frequency (TF-IDF) analysis was performed to evaluate the significance of key terms in the GAI security research corpus. This analysis provides insights into the most frequently occurring words, as well as those with unique significance, as indicated by their TF-IDF scores in
Table 2.
The term “ChatGPT” emerged as the most frequent word, appearing 2799 times across the documents. However, its TF-IDF score of 602 is relatively moderate. This suggests that while “ChatGPT” is commonly mentioned, its occurrence is spread across multiple documents, thus diminishing its contribution to the uniqueness of any individual document. Consequently, this term is central to the overall research landscape, but it may not be the defining focus of specific studies.
In contrast, the term “model” appears 1503 times and has a high TF-IDF score of 602, highlighting its importance both in terms of frequency and its distinct presence in certain documents. This likely reflects discussions on GAI models and their implications for security. Similarly, “AI” (artificial intelligence) is mentioned 1271 times, with a notable TF-IDF score of 433, emphasizing its pivotal role in research centered on generative AI.
Other prominent terms include “language”, “data”, and “intelligence”, which rank highly in both frequency and TF-IDF scores. For example, the term “language” appears 958 times, with a TF-IDF score of 530, underscoring its critical importance in discussions of large language models (LLMs), a key component of GAI research. The terms “data” (TF-IDF: 408) and “intelligence” (TF-IDF: 367) are also central to analyses of information systems and intelligence frameworks—crucial aspects in the application of GAI to security challenges.
Terms such as “technology”, “tool”, and “application” also exhibit moderate-to-high TF-IDF scores, signaling their relevance in discussions about the practical implementation of GAI technologies across various domains. These words reflect the growing interest in how GAI can be operationalized for specific use cases, particularly in areas like security and technology management.
Additionally, terms such as “challenge” (TF-IDF: 307) and “analysis” (TF-IDF: 297) suggest ongoing discussions about the challenges and methodological considerations in integrating GAI into existing security frameworks. These words highlight the practical obstacles and strategic analysis required for the effective deployment of GAI technologies in sensitive environments.
Finally, terms related to “security” (TF-IDF: 265) and “information” (TF-IDF: 250) are central to the discourse, reflecting the research focus on securing data and systems in the context of GAI applications. Although these terms appear less frequently, their high TF-IDF scores indicate their critical importance in specific documents, especially those addressing core research questions on the security implications of GAI.
Overall, this analysis demonstrates that while certain terms, such as “ChatGPT” and “AI”, are frequently mentioned across the document set, the TF-IDF scores provide a deeper understanding of the nuanced importance of other terms in specific contexts. These insights help to clarify the key concepts driving research in GAI and security, revealing both the breadth and depth of ongoing discussions in the field.
4.3. Results of Keyword Centrality Analysis
A keyword centrality analysis was conducted to identify the most influential terms in research related to GAI and security. This analysis utilized three centrality measures: degree centrality, closeness centrality, and betweenness centrality, as summarized in
Table 3.
Degree centrality quantifies the number of direct connections that a keyword has within the research network, reflecting its prominence. The term “ChatGPT” emerges with the highest degree centrality score (0.93878), making it the most connected and central keyword in the network. It is followed closely by “AI” (0.87755) and “model” (0.81633), both of which also exhibit strong connectivity, signifying their prominent presence in the research corpus. Other keywords, including “result” (0.81633), “datum” (0.79592), and “LLM” (0.77551), also display high degree centrality, reflecting their frequent co-occurrence with other terms and their integral role in the discourse surrounding GAI and security.
Closeness centrality measures how easily a keyword can connect to all other keywords in the network, providing insight into its accessibility and influence. The term “model” ranks highest in closeness centrality (0.67606), indicating that it is central to the network and can be quickly connected to other keywords. Other highly ranked terms include “language” (0.5931) and “LLM” (0.28357), which play key roles in bridging various concepts within the GAI and security discourse. Interestingly, despite its high degree centrality, “ChatGPT” has a relatively lower closeness centrality (0.25657), suggesting that although it is frequently mentioned, its connections are less direct or influential in linking other topics within the network. This finding implies that while “ChatGPT” is an essential term, it does not serve as a primary hub for reaching other keywords in the network.
Betweenness centrality assesses how effectively a keyword acts as a bridge, facilitating connections between different groups of keywords. “ChatGPT” once again leads with the highest betweenness centrality (0.02865), underscoring its role as a key connector in the research landscape. “AI” (0.02539) and “result” (0.02225) also play vital bridging roles, helping to link different research topics and clusters. Other terms, such as “datum” (0.01983), “model” (0.01907), and “technology” (0.01786), exhibit strong betweenness centrality, indicating their importance in connecting different research areas within the broader GAI and security domain.
In summary, this analysis highlights the varying roles of “ChatGPT”, “AI”, and “model” within the research network. While “ChatGPT” and “AI” are highly connected and serve as important bridges within the research landscape, “model” stands out in terms of accessibility, playing a crucial role in directly linking different keywords across the landscape. The prominence of terms like “technology”, “LLM”, and “analysis” across all centrality measures emphasizes their significant contributions to discussions of GAI and security, further illustrating the complexity and interconnectedness of research topics in this field.
4.4. Analysis Results of LDA Topic Modeling
4.4.1. Coherence Score Measurement Results
The optimal number of topics for LDA modeling was determined based on the methodology of Kim et al. [
32], with the parameters set as follows: α = 0.1, β = 0.01, and 1000 iterations. The number of topics was gradually increased from two to fifteen to evaluate the coherence scores, a critical metric for assessing the interpretability of the topics generated by the model. The six topics with the highest coherence score of 0.523 were identified as the most meaningful and interpretable, as shown in
Figure 7.
Once the optimal number of topics was established, the top ten most frequent keywords for each of the six topics were identified, and they are summarized in
Table 3. To ensure the accuracy and representativeness of the topic labeling, a thorough review of the keywords and relevant abstracts from the research bibliographies was conducted. This review involved collaboration with experts, including researchers with over five years of experience in IT institutions and professors specializing in text mining. Properly naming the topics was crucial for providing clear and concise representations of the central themes emerging from the LDA model. Through a meticulous examination of the keywords and a comprehensive review of the context provided by the abstracts, the most appropriate topic names were assigned, ensuring that they accurately reflected the underlying research areas related to GAI and security.
The analysis revealed six distinct topics within the research on GAI and its security applications, as illustrated in
Figure 8. Among these, the topic “AI in Educational Security” emerged as the most prominent, representing 40% of the papers analyzed. This suggests a significant focus on the integration of AI in educational environments, particularly with respect to security concerns. In contrast, the topic “Healthcare Security with AI” was the least represented, comprising only 5% of the topics. This reflects a specialized but growing area of research within the broader field of GAI and security. The distribution of topics highlights the diverse applications of GAI, with a strong emphasis on its role in education, while acknowledging the increasing importance of security in healthcare, albeit on a smaller scale.
4.4.2. Topic Classification Results
The topic modeling analysis identified six distinct themes within the research on GAI and its applications to security. Each theme was analyzed based on its primary keywords and contextual relevance, providing a comprehensive understanding of the research areas within the domain. These results are presented in
Table 4.
The initial theme, “AI and Security in Education”, which comprises 40% of the total, is defined by keywords such as “ChatGPT”, “education”, “student”, and “technology”. This theme addresses security concerns arising from the use of AI in educational settings. Research within this theme focuses on safeguarding student data, enhancing the security of online learning platforms, and mitigating potential security threats in AI-integrated learning environments. A representative study by Torres et al. (2023) examined the influence of generative AI on higher education, with a particular emphasis on ethical considerations and maintaining academic integrity. The study highlighted that while tools like ChatGPT offer new possibilities for teaching and learning, they also present challenges related to information reliability, privacy, and security. Proposed security measures included encrypting student data, implementing rigorous access controls, and providing security awareness training to maintain academic integrity in AI-driven learning environments [
45].
The second theme, “Security in Language Models and Data Processing”, focuses on security issues related to language models (LLMs) and data processing techniques. Keywords such as “model”, “LLM”, “language”, and “data” point to research on dataset integrity, vulnerabilities within language models, and privacy concerns. A representative study by Alawida et al. (2023) examined the architecture, training data, and associated security risks, including potential cyberattacks. The study proposed countermeasures such as data anonymization, model hardening, and rigorous data validation to mitigate these risks, while also emphasizing the importance of responsible AI use and the need for legal safeguards to prevent misuse [
46].
The third theme, “Secure Software Development with AI”, addresses the security challenges inherent in developing software that utilizes AI technologies. Keywords like “ChatGPT”, “code”, “model”, and “software” reflect research on AI’s role in coding, the security of AI-generated code, and methods for enhancing security throughout the software development lifecycle. A study by Horne et al. (2023) discussed the potential risks associated with AI code assistants such as GitHub Copilot, particularly the possibility of introducing unintended vulnerabilities. The paper proposed rigorous human review and secure software development practices as ways to mitigate these risks [
47].
The fourth theme, “Security and Risk Management of AI Systems” (14%), focuses on the security and risk management of AI systems. Keywords like “security”, “system”, “AI”, and “technology” indicate research into the inherent vulnerabilities of AI systems and the necessary security considerations during their development. Charfeddine et al. (2024) investigated the dual role of ChatGPT in cybersecurity, analyzing both its benefits and risks. The study recommended comprehensive threat analysis, the implementation of misuse prevention mechanisms, and the establishment of secure development environments as essential measures for managing the risks associated with AI systems [
48].
The fifth theme, “User Privacy and AI Security” (9%), explores the security and privacy challenges that arise from user interactions with AI systems, as evidenced by keywords such as “user”, “factor”, “analysis”, and “perception”. In a comprehensive examination of public opinion regarding the privacy implications of ChatGPT, Tang and Bashir (2024) identified significant concerns about data privacy and security. The study proposed countermeasures such as minimizing data collection, implementing rigorous data management practices, and incorporating feedback loops to prevent security threats and avoid misinterpreting user intent [
49].
The final theme, “Healthcare Security with AI” (5%), addresses the security challenges posed by AI technologies in the healthcare sector, particularly regarding the protection of patient data and the security of AI-powered diagnostic systems. A study by Chen, Walter, and Wei (2024) investigated the deployment of ChatGPT-like AI tools in healthcare, emphasizing the need to safeguard privacy, ensure confidentiality, and comply with regulatory norms. The study recommended encrypting patient data, conducting periodic security assessments, and validating AI models to strengthen healthcare systems against potential security breaches [
50].
4.4.3. Weight by Topics
Figure 9 presents a network visualization that provides a comprehensive overview of the top 100 keywords identified through topic modeling analysis. This visual representation illustrates the complex interrelationships between various topics and their associated terms in the context of GAI and its security applications. The network highlights central themes within GAI security, revealing common keywords that link multiple topics and identifying broader thematic overlaps across different domains.
The keyword “ChatGPT” stands out as particularly significant, appearing frequently in contexts related to healthcare security with AI, user privacy and AI security, and AI system security and risk management. This widespread presence underscores the multifaceted role of ChatGPT, an advanced AI language model, across various sectors. For example, in healthcare, ChatGPT is being explored as a tool to enhance communication between patients and healthcare providers, with the potential to improve diagnostic accuracy and patient engagement. However, its use raises serious concerns about data privacy, security vulnerabilities, and the potential for misuse, especially in sensitive areas like healthcare and cybersecurity. The repeated appearance of “ChatGPT” in these contexts emphasizes the need for robust, tailored security measures to address both its benefits and associated risks.
Similarly, the keyword “security” emerges as a central theme that links multiple topics, such as AI system security and risk management, user privacy and AI security, and healthcare security with AI. This highlights the universal importance of security in the development, deployment, and operation of AI systems. Protecting patient data in healthcare, safeguarding user privacy in AI interactions, and securing AI systems against cyber threats are all critical areas where security is paramount. The frequent mention of “security” underscores the essential role that it plays in ensuring the resilience and trustworthiness of AI technologies across various domains.
The term “data” also appears consistently across a number of topics, including security in language models and data processing, AI system security and risk management, and healthcare security with AI. Data are fundamental to AI, serving as the foundation on which AI systems are built and operate. Ensuring the security of data, especially in sensitive sectors like healthcare, is critical. The recurrence of “data” across different topics highlights the urgent need for robust data protection measures, such as encryption, secure storage, and rigorous access controls, to protect sensitive information throughout its lifecycle in AI applications.
The term “privacy” is closely associated with several topics, including user privacy and AI security, healthcare security with AI, and AI system security and risk management. As AI systems increasingly handle personal and sensitive data, privacy concerns have become a major issue. The repeated mention of “privacy” reflects the ongoing challenge of balancing the advantages of AI with the need to protect individuals’ privacy and comply with regulatory standards. This underscores the necessity of implementing technologies and practices that safeguard the confidentiality of user data.
Lastly, the keyword “system” frequently appears in discussions related to AI systems security and risk management and secure software development with AI, reflecting broad concerns about the security and integrity of AI systems. The term “system” plays a central role, whether it refers to securing the entire AI infrastructure or incorporating security measures early in the software development process. The emphasis on “system” highlights the importance of taking a comprehensive approach to AI security, considering the entire ecosystem and implementing safeguards against potential threats.
In conclusion, the common keywords identified in this network visualization—such as “ChatGPT”, “security”, “data”, “privacy”, and “system”—serve as critical connectors between various topics, illustrating broader themes that are relevant across multiple domains. These keywords underscore the interconnectedness of AI security issues and highlight the need for a holistic security strategy that addresses concerns ranging from data integrity to user privacy, while also ensuring the overall security of AI systems. Understanding the significance of these keywords provides valuable insight into the challenges of AI security, guiding both research and practice toward more integrated and effective solutions. The visualization not only maps out specific areas of focus within AI security but also emphasizes critical cross-cutting issues that require attention, contributing to the development of robust and resilient AI technologies that are capable of safely supporting a wide range of applications.
5. Discussion
This study provides a comprehensive analysis of research trends in generative artificial intelligence, particularly focusing on security aspects. The bibliometric analysis indicates that the United States, China, and India lead in GAI security research, highlighting the global significance of AI technologies in mitigating security risks Prominent research institutions, such as the Chinese Academy of Sciences and the National University of Singapore, also play a pivotal role, highlighting the concentrated efforts from both Western and Asian regions in advancing GAI security research.
The results from the TF-IDF and keyword centrality analyses provide valuable insights into the key terms shaping the discourse around GAI and security. While “ChatGPT”, “AI”, and “model” emerged as central terms, their varying TF-IDF and centrality scores suggest different roles within the research context. For instance, although “ChatGPT” is frequently mentioned, it does not hold the same centrality in linking core topics as terms like “model” and “AI”. This distinction emphasizes the complexity of GAI applications, which span diverse industries—from education to healthcare—with distinct security implications.
Latent Dirichlet allocation (LDA) topic modeling identified six key themes in the GAI security domain: AI and security in education, security in language models and data processing, secure software development with AI, security and risk management of AI systems, user privacy and AI security, and healthcare security with AI. The prominence of AI in education highlights the growing concerns over protecting student data and preserving academic integrity in AI-driven environments. At the same time, the focus on language models and data processing underscores the critical need to secure the foundational datasets and algorithms powering GAI systems. The relatively smaller share of research on healthcare security reflects an emerging but increasingly vital area of concern, as AI is integrated into sensitive sectors such as healthcare.
A recurrent theme pervading the discourse on these subjects is the critical need for advanced security frameworks that extend beyond the mere remediation of technical vulnerabilities to encompass proactive threat detection and mitigation strategies. The recurrent appearance of keywords such as “privacy”, “data”, and “system” across various subjects underscores not only the necessity of comprehensive security strategies but also the increasing intricacy of GAI-related threats. This study emphasizes the necessity of transitioning from conventional security approaches to more advanced methods by incorporating real-time anomaly detection, adversarial robustness techniques, and trustworthiness verification into research in AI security. Moreover, the escalating risks of data manipulation, misinformation, and adversarial attacks on GAI models necessitate a paradigm shift in AI security methodologies. Rather than solely focusing on risk management, future research should emphasize proactive detection mechanisms, such as explainable AI for security audits, AI-generated content authenticity verification, and NLP-driven anomaly detection for AI-driven misinformation. By highlighting these emerging challenges, this study emphasizes the need for a multidisciplinary approach that combines cybersecurity, AI ethics, regulatory frameworks, and adversarial AI research to develop resilient and transparent AI security solutions.
6. Conclusions
This study examined the security implications of GAI by analyzing 1047 academic articles sourced from the SCOPUS database. Using scientometric methods, including TF-IDF analysis, keyword centrality analysis, and LDA topic modeling, we identified key trends, themes, and security challenges associated with GAI technologies, particularly in relation to systems like ChatGPT. Furthermore, the evolution of scientific production in this field demonstrates a rapid increase in research interest, with only 2 publications in 2022, followed by a sharp rise to 540 publications in 2023 and 504 publications in the first half of 2024 (as of June). This trend highlights the growing significance of GAI security research, underscoring the need for continuous investigation into emerging threats and robust security frameworks.
The findings reveal that GAI security research is a global effort, with significant contributions from countries such as the United States, China, and India. Leading institutions, including the Chinese Academy of Sciences and the National University of Singapore, are at the forefront of advancing this field. Our analysis also highlighted the centrality of terms like “ChatGPT”, “AI”, and “model” in current research, reflecting the prominence of large language models and AI systems in discussions of security.
The six themes identified through LDA topic modeling—ranging from AI in education to healthcare security—illustrate the diverse applications and associated risks of GAI. While research on AI in education and language models remains dominant, emerging fields such as healthcare security are receiving increasing attention and require further exploration to comprehensively assess and mitigate potential risks in sensitive domains.
Despite the valuable insights that this study provides, it has several limitations. Firstly, the dataset is confined to articles published in SCOPUS, which, although comprehensive, may not encompass all relevant research, particularly non-English literature or studies from less prominent journals. This may introduce a regional or perspective-based bias in the findings. Additionally, the study’s data collection period (June 2022 to June 2024) captures recent developments but excludes earlier foundational research on AI security. Given the rapid pace of GAI advancements, the conclusions drawn may become outdated as new technologies and challenges emerge.
Furthermore, while LDA topic modeling offers a structured view of research trends, this method relies on statistical probabilities, which may not fully capture the depth and nuances of each article’s content. A more in-depth qualitative analysis could complement these findings by providing richer insights into the contextual details of GAI security challenges.
In conclusion, as GAI technologies continue to evolve, developing comprehensive security frameworks is essential to address both technical vulnerabilities and broader ethical and societal challenges. Future research should focus on interdisciplinary approaches that integrate technical, legal, and ethical perspectives to ensure the safe and responsible deployment of GAI technologies across industries. Additionally, expanding the analytical scope through the incorporation of diverse data sources and the adoption of a hybrid quantitative–qualitative approach will provide a more holistic and nuanced understanding of emerging security challenges associated with GAI systems.