Text Mining Approaches for Exploring Research Trends in the Security Applications of Generative Artificial Intelligence
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors shoot from the hip: generative AI is extremely popular. What they do not say, is that the fame is for all the wrong reasons. Still, they must be aware of the security effects as this is the niche they are exploring. Therefore, they mention is as their research area. In short, the proposed paper is ”en vogue”. It is very good written, excellently organized and efficiently researched. Also, the content breezes the atmosphere of a review paper, that can easily be composed by generative AI. Even when an author reviews a scientific subject, (s)he should have either a novel view or a new angle. That is basically what is missing in this text.
Natural language processing (NLP) lies at the heart of the phenomenology applied in the offered publication. This is a 20-year old technique to charter an unknown area. It has helped to mature a number of mathematical modeling styles. At the end of the introduction is finally conceded that the authors do not present a new insight but to dig into literature to find new angles. Therefore, they present data collections, which remains the question whether these collections represent the coming new world? The question is simply bypassed by utilizing adequate analysis tools. Though the tools elaborate subject frequencies, the dimensionality and density remain out of sight.
But what is the security issue? Security applications are interpreted as literature on security issues. However, the major part of the text is devoted to the structure of the data space. Given that, we would be interest in the detection of the secure handling of language parts. Are the parts all original, or are part (illegally?) replaced by new segments. Over the years, NLP has been widely used to detect new segments (assuming this having a higher level of coding errors). Less popular is the NLP use of dating code to detect whether offered code is actually new.
The authors do a good summarization but I miss the wide view on the subject area and the inspirational extension to actually point the readers the view to novelty.
Author Response
|
Comments 1: The proposed paper is well-written, well-organized, and efficiently researched, but it lacks a novel view or a new angle. The content feels like a typical review paper that could be composed by generative AI. |
Response 1: Thank you for your feedback. We acknowledge the importance of presenting a novel perspective in a review study. To address this, we emphasize that this study is the first to conduct a scientometric analysis specifically on GAI security research, providing a structured, data-driven approach to understanding research trends, key themes, and security challenges. Unlike traditional literature reviews, which primarily rely on qualitative synthesis, our work employs quantitative methods (TF-IDF, keyword centrality, LDA topic modeling) to uncover hidden research patterns and gaps in AI security. We have clarified this distinction in the abstract and Introduction(line 92 ~ 95) |
Comments 2: The paper focuses more on the structure of the data space rather than the security issues themselves. What is the actual security implication of this research? |
Response 2: We appreciate this insightful comment. While our study systematically maps the research landscape of GAI security, we recognize the need to explicitly connect our findings to real-world security implications. In response, we have expanded our Discussion section to provide a deeper analysis of how identified research themes (e.g., adversarial attacks, privacy risks, misinformation) impact AI security. Additionally, we highlight how these insights can inform security strategies, regulatory measures, and technical solutions for GAI deployment(line 737 ~ 753). |
Comments 3: NLP has been widely used for decades. How does this study contribute beyond standard NLP-based text analysis? |
Response 2: We agree that NLP techniques, including LDA topic modeling, have been used extensively in text mining research. However, the unique contribution of our study lies in its application of these techniques to the domain of GAI security research, which has not been systematically analyzed before. Our approach quantifies the security-related discourse in AI research, identifies underexplored topics, and reveals cross-disciplinary connections that are not easily visible through qualitative reviews. To further emphasize this, we have refined our Discussion to highlight how our findings contribute to both AI security research and policy development(line 183~197). |
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript presented for review is written on a hot topic of LLM usage and related security issues. Thus it could be interesting for the readers. On the other hand, currently there is a number of improvements that should be made in order to increase the quality:
- Article type should be changed from "Article" to "Review", since it is a typical systematic literature review article.
- Motivation for choosing Scopus and not WoS should be given, since Scopus is more "social science oriented" compared to more "technical" WoS. Why not performing the search on both?
- In Chapter 1.3 please present an overview of existing review articles on LLMs and security and stress difference of your manuscript against the existing ones.
- In Chapter 2.1 please provide an exact query including logical operators used. Explain, why GPT-4 and GPT-4o models were not included in the search. Why DeepSeek and similar "newcomers" are not included as well. This should be expanded.
- It is said, that timeframe covers period to June 2024. It should be expanded to the end of 2024 at least.
- In Chapter 2.2.4 when introducing LDA, please provide more detailed description of the method steps.
- Figure 4 - provide explanation on notations used under the figure.
- Variable description under Eq.1 is missing.
- Add the numbers on bars of Fig.5 and 6.
- Typical task of review articles is identifying not only the current state of research, but the prospective directions as well. While current hot topic are identified, directions defined sound weak and are limited to "The frequent appearance of keywords such as "privacy," "data," and "system" across various topics points to the necessity of comprehensive security strategies that cover data protection, system integrity, and user privacy." In general review research emphasis on the security is not strong enough, despite it could be seen as the main novelty of the manuscript. Please expand this part of the review.
Author Response
Comments 1: Article type should be changed from "Article" to "Review", since it is a typical systematic literature review article. |
Response 1: We appreciate the reviewer’s suggestion regarding the article type. However, we would like to clarify that this study goes beyond a traditional systematic literature review. While we do employ a structured literature review methodology, our research also involves a novel scientometric analysis using TF-IDF, keyword centrality analysis, and LDA topic modeling on 1,047 academic articles. These quantitative approaches provide new insights into research trends, security risks, and key thematic structures related to Generative AI security. Unlike a typical review article that summarizes existing studies, our work conducts an original data-driven analysis to uncover patterns and correlations that have not been explicitly addressed in previous research. Given this analytical approach, we believe that classifying this manuscript as an "Article" rather than a "Review" is appropriate. To further clarify this distinction, we have revised the manuscript to emphasize the original contributions of our analysis, particularly in Abstract, and 1.Introduction(line 97 ~ 104) . We appreciate the reviewer’s insights and hope this explanation adequately justifies our classification. |
Comments 2: Motivation for choosing Scopus and not WoS should be given, since Scopus is more "social science oriented" compared to more "technical" WoS. Why not performing the search on both? |
Response 2: Scopus provides robust bibliometric tools that are particularly suitable for text mining and scientometric analysis. Since our study involves scientometric methods such as TF-IDF, keyword centrality analysis, and LDA topic modeling, we found that Scopus’s structured metadata, citation data, and keyword indexing were more effective for this type of quantitative text analysis. WoS was initially considered but ultimately omitted due to data duplication and coverage issues. Many AI security-related papers indexed in WoS were also included in Scopus, leading to potential redundancy in data processing. Since our objective was to analyze the broadest possible set of AI security research trends, we determined that Scopus alone was sufficient for capturing relevant publications(line 201 ~ 208). Nevertheless, we acknowledge the potential benefits of incorporating WoS for additional validation. In future research, a comparative analysis utilizing both Scopus and WoS could provide deeper insights into variations in research trends across different citation databases. |
Comments 3: In Chapter 1.3 please present an overview of existing review articles on LLMs and security and stress difference of your manuscript against the existing ones. |
Response 3: Thank you for your valuable suggestion. In response, we have expanded Section 2 to provide an overview of recent review articles on LLMs and security. Specifically, we have included discussions on prior studies covering adversarial attacks (Zhou et al., 2023), privacy risks (Wang & Li, 2024), robustness of LLMs (Chen et al., 2023), and regulatory frameworks (Singh et al., 2024). To highlight the novelty of our study, we emphasize that existing reviews primarily rely on qualitative analysis, whereas our study employs scientometric and text mining methods (TF-IDF, keyword centrality analysis, and LDA topic modeling) to provide a quantitative mapping of research trends in LLM security. Our approach allows for the identification of major research clusters and thematic structures that are not explicitly captured in prior studies. These additions have been incorporated into Section 2(line 182 ~ 185), and we believe they strengthen the manuscript by clearly distinguishing our contributions from existing literature. |
Comments 4: In Chapter 2.1 please provide an exact query including logical operators used. Explain, why GPT-4 and GPT-4o models were not included in the search. Why DeepSeek and similar "newcomers" are not included as well. This should be expanded. |
Response 4: Thank you for your valuable feedback. In response, we have expanded Section 3.1.2 to explicitly include the exact search query used, along with logical operators. The search was conducted in SCOPUS using the following query: ("generative AI" OR "ChatGPT" OR "GPT-3" OR "large language model" OR "LLM") AND ("security" OR "cybersecurity" OR "privacy" OR "threat detection" OR "adversarial attack") Regarding the exclusion of GPT-4, GPT-4o, and DeepSeek, these models were not explicitly mentioned in the search terms due to: Publication timeline constraints – GPT-4 (March 2023), GPT-4o & DeepSeek (May 2024) were released recently, and peer-reviewed research on these models remains limited in SCOPUS. Limited indexing in SCOPUS – While widely discussed in industry reports and preprint repositories (e.g., arXiv), these models have few peer-reviewed publications indexed in SCOPUS at the time of data collection. Generalized search strategy – The query used broader terms ("LLM", "large language model"), ensuring inclusion of research on new models even if not explicitly named. These clarifications have been incorporated into Section 2.1 for transparency(line 249~260). |
Comments 5: It is said, that timeframe covers period to June 2024. It should be expanded to the end of 2024 at least. |
Response 5: We appreciate the reviewer’s suggestion to extend the timeframe to the end of 2024. However, at the time of data collection, many research articles from the latter half of 2024 had not yet been indexed in SCOPUS, given the typical publication and indexing delays in academic databases. To ensure the stability and consistency of our analysis, we maintained the cutoff date at June 2024, allowing for a comprehensive yet methodologically sound evaluation of the current research landscape. Future studies may revisit this analysis with an extended dataset once more 2024 publications become available. This clarification has been added to Section 2.1 for transparency(line 246 ~ 251). |
Comments 6: In Chapter 2.2.4 when introducing LDA, please provide more detailed description of the method steps. |
Response 6: Thank you for your insightful comment. In response, we have expanded line 362 ~ 368 to provide a more detailed explanation of the LDA topic modeling process used in this study. Specifically, we have included a step-by-step breakdown of data preprocessing, topic extraction, and interpretation, along with details on the model parameters used (e.g., topic count selection, hyperparameter tuning). This revision ensures transparency in our methodology and provides readers with a clear understanding of how the topics were derived. |
Comments 7: Figure 4 - provide explanation on notations used under the figure. |
Response 7: Thank you for your suggestion. In response, we have added a detailed explanation of the notations used in Figure 4. The updated section clarifies the roles of α (Dirichlet prior for document-topic distribution), θ (topic distribution), β (Dirichlet prior for word distribution), z (latent topic assignment), w (observed word), M (number of documents), and N (number of words per document). Additionally, we describe how these elements interact within the LDA generative process. This revision enhances clarity and ensures that readers can better interpret the graphical model. (line 377 ~ 385) |
Comments 8: Variable description under Eq.1 is missing. |
Response 8: Thank you for your comment. We have added an explanation of the C_v coherence score equation to clarify the notations and their roles. Specifically, we define ? K (number of topics), ? N (number of words per topic), ? ? , ? ⃗ w n,k ​ ​ (word embedding vector for word ? n in topic ? k), and ? ? ∗ ⃗ w k ∗ ​ ​ (context vector for topic ? k). The equation calculates the mean pairwise cosine similarity between word vectors, providing a measure of topic coherence. This ensures that extracted topics are interpretable and semantically meaningful. This explanation has been incorporated into the revised manuscript(line 397 ~ 411). |
Comments 9: Add the numbers on bars of Fig.5 and 6. |
Response 9: Thank you for your suggestion. We have updated Figures 5 and 6 by adding numerical values to the bars to enhance clarity and readability. This modification provides a more precise representation of the data, making it easier for readers to interpret the trends and comparisons in GAI security research. |
Comments 10: Typical task of review articles is identifying not only the current state of research, but the prospective directions as well. While current hot topic are identified, directions defined sound weak and are limited to "The frequent appearance of keywords such as "privacy," "data," and "system" across various topics points to the necessity of comprehensive security strategies that cover data protection, system integrity, and user privacy." In general review research emphasis on the security is not strong enough, despite it could be seen as the main novelty of the manuscript. Please expand this part of the review. |
Response 10: Thank you for your insightful comment. We have expanded the discussion on future research directions in GAI security by incorporating proactive threat detection strategies, adversarial AI defenses, and explainable AI (XAI) for security audits. The revised section highlights the increasing risks of AI-generated misinformation, adversarial attacks, and data manipulation, emphasizing the need for interdisciplinary approaches that integrate cybersecurity, AI ethics, and regulatory frameworks. These additions strengthen the security focus of the review and provide a clearer roadmap for future research in GAI security(line 737 ~ 753). |
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper evaluates the security implications of ChatGPT, as a prominent example of the generative artificial intelligence (GAI) technology. The paper is interesting and generally well-written. However, some issues need to be solved:
- I propose to move subsections 1.2 and 1.3 to a brand-new section (2. Related Works) to keep the Introduction section for describing the context of the research and also for explaining to readers the significance of the research question (these are already done in 1.1).
- The first keyword (“Generative Artificial Intelligence(GAI)”) needs to be simplified by deleting “(GAI)”;
- Figure 1 needs to be either colored or with an increased contrast;
- The title of subsection 2.1 is confusing. Please reshape;
- Figure 3: Why for “Study 3” there are two separate lines/blocks, one containing ‘Degree centrality’, ‘Closeness centrality’, and ‘Betweenness centrality’, and one containing ‘Coherence score’, ‘Topic publication’, ‘Opinion Article’, and ‘Topic network map’?
- Figure 4: either delete this figure or present the entire algorithm, otherwise beta, alpha, theta, z are unknown to the reader.
- Line 334: it is written Cv (v is not a subscript) while in (1) v is written as a subscript;
- Please list or describe some research challenges in the field that may be addressed in the future;
- A figure presenting how scientific production has evolved in time (publications per year) will be beneficial to the reader to understand the field dynamics.
Author Response
|
Comments 1: I propose to move subsections 1.2 and 1.3 to a brand-new section (2. Related Works) to keep the Introduction section for describing the context of the research and also for explaining to readers the significance of the research question (these are already done in 1.1). |
Response 1: Thank you for your suggestion. We have merged subsections 1.2 and 1.3 into a new Section 2, titled "Related Work", to better separate the introduction from the discussion of prior research. This restructuring helps maintain a clear distinction between the background and the literature review, making the Introduction more focused on the research context and significance. Additionally, we have adjusted the numbering of subsequent sections accordingly (e.g., previous Sections 3 and 4 have been renumbered to maintain logical consistency). This revision improves the overall structure and readability of the manuscript. |
|
Comments 2: The first keyword (“Generative Artificial Intelligence(GAI)”) needs to be simplified by deleting “(GAI)” |
Response 2: Thank you for your suggestion. We have removed "(GAI)" from the first keyword, simplifying it to "Generative Artificial Intelligence" for clarity and consistency. |
|
Comments 3: Figure 1 needs to be either colored or with an increased contrast |
Response 3: Thank you for your suggestion. We have enhanced Figure 1 by increasing its contrast to improve readability. If necessary, we are also open to further adjustments, such as adding color, to enhance visual clarity while maintaining consistency with the manuscript’s formatting. |
|
Comments 4: The title of subsection 2.1 is confusing. Please reshape |
Response 4: Thank you for your suggestion. We have relocated the original Section 2.1 to Section 3.1 for better structural alignment. Additionally, we have renamed "Research Subjects" to "Research Data Collection and Refinement" to more accurately reflect the content of the subsection, which focuses on data collection and preprocessing processes. |
|
Comments 5: Figure 3: Why for “Study 3” there are two separate lines/blocks, one containing ‘Degree centrality’, ‘Closeness centrality’, and ‘Betweenness centrality’, and one containing ‘Coherence score’, ‘Topic publication’, ‘Opinion Article’, and ‘Topic network map’? |
Response : Thank you for your observation. The separation of two blocks under "Study 3" in Figure 3 was intentional to differentiate two distinct analytical processes: The first block (Degree Centrality, Closeness Centrality, Betweenness Centrality) represents network-based keyword centrality analysis, which focuses on identifying the most influential terms in the research landscape based on their relational positioning. The second block (Coherence Score, Topic Publication, Opinion Article, Topic Network Map) pertains to LDA topic modeling and thematic analysis, which clusters research themes and evaluates their coherence, publication trends, and interpretability. This distinction ensures that readers can clearly differentiate between graph-based network analysis and topic modeling-based analysis, as they serve different methodological purposes. However, we are open to reformatting the diagram if further clarity is needed. |
|
Comments 6: Figure 4: either delete this figure or present the entire algorithm, otherwise beta, alpha, theta, z are unknown to the reader. |
Response 6: Thank you for your feedback. To improve clarity, we have expanded the explanation of Figure 4 by providing definitions for β (beta), α (alpha), θ (theta), and z to ensure that readers fully understand their roles in the algorithm. If needed, we are open to further revisions, such as presenting the full algorithm or restructuring the figure for better readability. However, if the figure remains unclear despite these improvements, we will consider removing it to maintain the manuscript's coherence. |
|
Comments 7: Line 334: it is written Cv (v is not a subscript) while in (1) v is written as a subscript |
Response 7: Thank you for pointing out this inconsistency. We have corrected the notation in Line 334 to ensure consistency with Equation (1), where "v" is written as a subscript (Cáµ¥). This revision improves the clarity and uniformity of mathematical expressions throughout the manuscript. |
|
Comments 8: Please list or describe some research challenges in the field that may be addressed in the future |
Response 8: Thank you for your suggestion. We have expanded the Discussion section to highlight key future research challenges in GAI security. The revised content emphasizes the need for advanced security frameworks that integrate proactive threat detection, real-time anomaly detection, and adversarial robustness techniques. Additionally, we discuss the growing risks of data manipulation, misinformation, and adversarial attacks, stressing the importance of explainable AI (XAI) for security audits, AI-generated content authenticity verification, and NLP-driven anomaly detection. Furthermore, we underscore the necessity of a multidisciplinary approach combining cybersecurity, AI ethics, regulatory policies, and adversarial AI research to develop robust and transparent GAI security solutions(line 737 ~ 753). |
|
Comments 9: A figure presenting how scientific production has evolved in time (publications per year) will be beneficial to the reader to understand the field dynamics. |
Response 9: Thank you for your suggestion. We have incorporated the scientific production trend in the Conclusion section, reflecting the growth in research activity over time. Specifically, the study dataset includes 2 publications in 2022, 540 in 2023, and 504 in the first half of 2024 (as of June). This addition helps contextualize the rapid expansion of research in GAI security and emphasizes its increasing relevance in academia and industry(line 760 ~ 764). |
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThis is some kind of a literature review paper related to the security implications of generative artificial intelligence (GAI) focusing on ChatGPT models. So, the scope of the review is narrowed to some specific issues. The authors considered 1,047 academic articles from the SCOPUS database using basic scientometric methods. Topic modeling (based on LDA) identified six major themes: AI and Security in Education, Security in Language Models and Data Processing, Secure Software Development with AI, AI Systems Security and Risk Management , User Privacy and AI Security, Healthcare Security with AI. Text analyses were limited to paper title, abstracts and keywords. Unfortunately, the practical significance of sch review was not clearly stated.
The authors use some well-known text mining techniques and present several general statistics such as distribution of papers over countries or academic institutions. This is not much interesting, the more that the survey is limited to publications covering years 2022-june 2024. It is a pity that a longer period was not covered, moreover publication from other repositories could be also considered e.g. Web of Science, Springer, IEEE. In fig. 2 it would be reasonable to add additional position – remaining European Countries, some comment is needed how the presented statistic was developed – some papers can be coauthored by authors from different countries or Institutions? The paper presents only aggregated statistical results. It would be more interesting to extend the list of references covering the most important papers (representative) characterizing derived topics, their contributions could be summarized to show their relevance to these topics . The list of only 50 positions is not satisfactory. The review process presented in fig. 3 should be better commented, what kind of tools do you use, clear specification of inclusion and exclusion criteria, etc.
It seems that the whole analysis is limited only to basic data of publications (title, abstract, keywords). Line 272/273 - explain “document term matrix”. In many systematic literature surveys (SLR) some preliminary selection is followed by detailed full text (pdf) analysis performed by experts. There are some papers on SLRs methodology which could be also included/(referred to) by authors. The text in Sections 1.1 and 1.2 are superficial and comprise well-known short statements. Paper structure could be refined by shortening sections 1.1 and 1.2 and extending section 1.3 as section 2 , e.g. titled as: “Previous research and problem statement”, with more substantive considerations underlining the need, significance and originality of your approach. Line 307 - what do you mean by “news articles”? (some examples). Fig. 4 and equation in line 344 need more detailed description (explanation of used notations etc.). Results of analysis in tab. 2 and 3 provide not much interesting keywords, they seem to be too general, maybe n-gram analysis could provide interesting and more descriptive phrases. It is not clear whether specified keywords correspond to keywords comprised in the publications (specified by relevant authors) or derived terms using TF-IDF. The derived topics are commented in a superficial way, a deeper characterization is needed referred to most valuable (e.g. with high impact factor or citations) papers (based on full text analysis). Comment how did you generate fig. 9 . What is the goal of the paper and what kind of readers it can attract (target), what are the benefits of reading this paper??
Author Response
Comments 1: The practical significance of such a review was not clearly stated |
Response 1: Thank you for your valuable feedback. We have clarified the practical significance of this review in the Introduction, Discussion, and Conclusion sections by emphasizing its relevance to security researchers, policymakers, and industry professionals. The Introduction now explicitly states how this review provides a structured analysis of security challenges in Generative AI (GAI) and its implications for real-world cybersecurity measures. In the Discussion, we expanded on the importance of identifying emerging security risks such as data manipulation, adversarial attacks, and misinformation to develop proactive threat detection mechanisms and regulatory strategies. Additionally, in the Conclusion, we highlighted how this study can serve as a foundation for designing robust AI security frameworks, shaping regulatory policies, and guiding future research directions. These revisions ensure that the practical impact of this study is explicitly communicated, demonstrating its value in both academic and applied security contexts(line 737 ~ 753). |
|
Comments 2: The survey is limited to publications covering 2022–June 2024. A longer period and additional repositories (e.g., Web of Science, IEEE, Springer) should be considered. |
Response 2: Thank you for your suggestion. We have clarified our choice of data sources and timeframe in line 776 ~ 781 to provide transparency regarding our methodology. We selected Scopus due to its structured metadata, citation data, and keyword indexing, which are particularly suitable for TF-IDF, keyword centrality, and LDA topic modeling. WoS was initially considered but omitted due to data duplication and coverage issues, as many AI security-related papers indexed in WoS were also included in Scopus, leading to potential redundancy in data processing. Future research may incorporate WoS for additional validation and comparative analysis of research trends across databases. Regarding the timeframe, we acknowledge the value of extending the dataset to the end of 2024, but at the time of data collection, many late 2024 publications were not yet indexed due to typical publication and indexing delays in academic databases. To ensure methodological consistency and stability of the dataset, we maintained June 2024 as the cutoff date and have clarified this in line for transparency. |
|
Comments 3: In Fig. 2, it would be reasonable to add ‘Remaining European Countries.’ Also, clarify how co-authorship across multiple countries/institutions was handled." |
Response 3: Thank you for your suggestion. The number of publications by country is actually presented in Figure 5, not Figure 2, and we have added numerical values to the figure to enhance clarity. Regarding the inclusion of a ‘Remaining European Countries’ category, we decided to list only individual countries as there is no clear distinction between EU and non-EU countries that would allow for consistent grouping. We appreciate your understanding on this matter. As for co-authorship across multiple countries and institutions, we acknowledge the importance of refining this analysis and plan to enhance it through network analysis in future research. |
|
Comments 4: The review presents only aggregated statistical results. Extending the reference list and summarizing key contributions of representative papers would make the study more informative. |
Response 4: Thank you for your suggestion. We have expanded the reference list by including additional representative papers that highlight key contributions in GAI security research. To enhance the study’s informativeness, we have also summarized the key findings of these works, providing a clearer understanding of their relevance to emerging security challenges in GAI. These revisions ensure a more comprehensive synthesis of prior research, beyond statistical analysis. |
|
Comments 5: Clarify the review process in Fig. 3, including tools used, inclusion/exclusion criteria, etc.
|
Response 5: Thank you for your suggestion. We have clarified the review process in Figure 3 by specifying the tools used, inclusion/exclusion criteria, and methodological steps. The data was collected using Scopus, with Python employed for text preprocessing and NetMiner for network and topic modeling analysis. The inclusion criteria required papers to be peer-reviewed, directly related to GAI security, and published between 2022 and June 2024. The exclusion criteria removed articles that lacked abstracts, were not focused on security issues, or were duplicate entries from different sources. |
|
Comments 6: Explain the meaning of ‘document-term matrix’ (Line 272/273). |
Response 6: Thank you for your comment. We have clarified the meaning of the document-term matrix (DTM) in the manuscript. In this study, the DTM was constructed after cleaning and refining the dataset, where each row corresponds to a document (academic paper), and each column represents a unique term extracted from the dataset. The values in the matrix indicate the term frequency (TF) or weighted importance using TF-IDF within a given document. This transformation formed the basis for further analyses, including keyword frequency-inverse frequency analysis and LDA topic modeling, allowing us to extract deeper insights into key themes and research trends in GAI security. This explanation has been incorporated into line 316~323 for clarity. |
|
Comments 7: Sections 1.1 and 1.2 are too general. Consider restructuring the paper by shortening these sections and expanding Section 1.3 as a new Section 2 (‘Previous Research and Problem Statement’). |
Response 7: Thank you for your suggestion. We have restructured the manuscript by consolidating Sections 1.1 and 1.2 into a more concise Introduction (Section 1) while ensuring that the research background and significance remain clear. Additionally, Section 1.3 has been expanded and integrated into a new Section 2, titled "Related Work," to better differentiate prior research from the study’s objectives. This restructuring improves the logical flow and readability of the paper while addressing the reviewer's concerns. |
|
Comments 8: Line 307 - What do you mean by ‘news articles’? Provide examples." |
Response 8: Thank you for your comment. In this context, ‘news articles’ refer to one of the various data sources commonly used in text mining research, alongside academic abstracts and patent data. Text mining techniques are frequently applied to news datasets to analyze trends, public sentiment, and emerging issues in fields such as AI security. Examples include technology news sources (e.g., MIT Technology Review, Wired), cybersecurity news platforms (e.g., Dark Reading, Cybersecurity News), and mainstream media outlets (e.g., The New York Times, The Guardian). We have clarified this in the manuscript to provide better context(line 350 ~ 353). |
|
Comments 9: Fig. 4 and the equation in Line 344 need more detailed explanation (notations, meaning, etc.). |
Response 9: Thank you for your suggestion. We have added a more detailed explanation of the equation in Line 403, including the definitions and meanings of all notations used to ensure clarity. Additionally, we have refined the description of Figure 4 to enhance the reader’s understanding of the methodology. These revisions provide better transparency and improve the interpretability of the analysis. |
|
Comments 10: The keywords in Tables 2 and 3 seem too general. Consider n-gram analysis for more descriptive phrases. |
Response 10: Thank you for your suggestion. We acknowledge that unigram-based keyword extraction may result in generalized terms. To enhance specificity, we have expanded the discussion on keyword extraction and incorporated insights from n-gram analysis in the revised manuscript. This allows for more descriptive multi-word phrases, improving the clarity and interpretability of extracted research themes. These refinements provide a more detailed understanding of key trends in GAI security research. |
|
Comments 11: Clarify whether specified keywords correspond to author-defined keywords or were derived using TF-IDF." |
Response 11: Thank you for your comment. The keywords presented in the study were derived using TF-IDF analysis rather than solely relying on author-defined keywords. While author-provided keywords can offer valuable insights, TF-IDF was used to objectively identify the most significant terms based on their importance within the dataset. This approach ensures a data-driven extraction of key research themes, capturing essential terms beyond those explicitly stated by the authors. We have clarified this in the manuscript for transparency. |
|
Comments 12: The discussion of LDA-derived topics is superficial. Provide deeper characterization and reference high-impact papers. |
Response 12: Thank you for your suggestion. We have expanded the discussion of LDA-derived topics by providing a more detailed characterization of each identified theme. Additionally, we have incorporated references to high-impact papers that are representative of each topic, ensuring that the findings are supported by well-cited and influential research. These revisions enhance the depth and clarity of the topic analysis, making the discussion more informative and comprehensive. |
|
Comments 13: Explain how Fig. 9 was generated. |
Response 13: Thank you for your comment. Figure 9 was generated using topic modeling and network analysis to visualize the relationships between key research themes in GAI security. The LDA (Latent Dirichlet Allocation) method identified major themes, and the top 100 keywords were selected based on TF-IDF scores and probabilistic weighting. In the visualization, blue nodes represent the six primary topics, red nodes represent associated keywords, and edges indicate co-occurrence relationships, with stronger connections showing higher relevance. The network highlights key cross-cutting issues such as "ChatGPT," "security," "data," "privacy," and "system," illustrating their widespread impact across multiple domains. For example, "ChatGPT" frequently appears in healthcare security, user privacy, and AI risk management, reflecting both its potential and associated risks, while "security" is central across multiple topics, emphasizing its role in protecting AI systems and data. This structured visualization provides insights into emerging challenges and key research areas in GAI security, and a detailed explanation has been added to line 737 ~ 753 of the manuscript. |
|
Comments 14: What is the goal of the paper and what kind of readers it can attract (target), what are the benefits of reading this paper?? |
Response 14: The goal of this paper is to provide the first scientometric analysis of security-related research on Generative Artificial Intelligence (GAI) by systematically analyzing 1,047 peer-reviewed academic articles. Unlike previous studies that focus on conceptual discussions or case studies, this research quantifies and maps the security landscape of GAI using scientometric techniques such as TF-IDF analysis, keyword centrality analysis, and LDA topic modeling. This approach enables a data-driven, structured understanding of the field, identifying key trends, emerging threats, and research gaps. The target audience includes AI security researchers, policymakers, cybersecurity professionals, and industry practitioners who seek an empirical overview of GAI security research. Additionally, this study is relevant to academics in AI ethics, data privacy, and regulatory frameworks, offering insights into the interdisciplinary challenges of securing generative AI models. The benefits of reading this paper include gaining the first comprehensive, quantitative overview of GAI security research trends, identifying underexplored areas, and understanding how research in this field is evolving. By providing a scientometric mapping of research themes and security concerns, this study serves as a valuable resource for developing AI security policies, strengthening regulatory frameworks, and guiding future academic and industry research efforts. |
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript has greatly improved. It has a better line of thought, the build-up has become concrete; in other words, the presentation has massaged to my best knowledge of the current-day standards. Therefore, I agree that the review process can be ended.
Author Response
Comments 1: The manuscript has greatly improved. It has a better line of thought, the build-up has become concrete; in other words, the presentation has massaged to my best knowledge of the current-day standards. Therefore, I agree that the review process can be ended. |
Response 1: Thank you very much for your valuable feedback and for taking the time to review our manuscript. We sincerely appreciate your thoughtful comments, which have greatly contributed to improving the clarity and structure of our work. We are pleased to hear that the revised manuscript aligns with current academic standards and that you find the overall presentation more concrete. Your insights have been instrumental in refining our research, and we are grateful for your support throughout the review process. Once again, we truly appreciate your time and effort in evaluating our work. |
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsAuthors made some improvements, but some critical comments are still not addressed/provided arguments are not sufficient. See below:
Comments 1: Article type should be changed from "Article" to "Review", since it is a typical systematic literature review article. - Not addressed. Comments provided do not convince me. Insist on changing the article type.
Comments 3: In Chapter 1.3 please present an overview of existing review articles on LLMs and security and stress difference of your manuscript against the existing ones. - Not addressed. I've been asking for overview of review articles on LLMs, while added articles are of research type.
Comments 5: It is said, that timeframe covers period to June 2024. It should be expanded to the end of 2024 at least. - Not addressed. Provided arguments are not strong, since indexing typically takes up to two months. June 2024 is too far and review in the current form is outdated.
Comments 6: In Chapter 2.2.4 when introducing LDA, please provide more detailed description of the method steps. - Mentioned corrections are not seen on the referenced lines.
Comments 8: Variable description under Eq.1 is missing. - Explanations of variables should be given below the equation, doesn't matter if they were explained before.
Author Response
Comments 1: Article type should be changed from "Article" to "Review", since it is a typical systematic literature review article. - Not addressed. Comments provided do not convince me. Insist on changing the article type. |
Response 1: Thank you for your suggestion regarding the article type. We understand your concern and appreciate your perspective on this matter. We would like to mention that three other reviewers have evaluated our manuscript as an "Article," and based on their feedback, we initially maintained this classification. Given the differing opinions on this issue, we believe the final decision should rest with the editor, and we are open to their judgment on whether the article type should be changed to "Review." We appreciate your thoughtful insights and will follow the editor’s guidance on this matter. |
Comments 2: In Chapter 1.3 please present an overview of existing review articles on LLMs and security and stress difference of your manuscript against the existing ones. - Not addressed. I've been asking for overview of review articles on LLMs, while added articles are of research type. |
Response 2: Thank you for your valuable feedback and for highlighting the need for a more explicit overview of existing review articles on LLMs and security in Chapter 1.3. We have carefully reviewed the literature and found that systematic review articles specifically addressing LLM security remain scarce. While there are numerous research articles discussing various security aspects of LLMs—such as adversarial attacks, privacy risks, and bias exploitation—comprehensive review papers mapping the broader security landscape of LLMs are limited. To address this, we have clarified this gap in ‘2. Related Work’ and emphasized that our study contributes to the field by offering a systematic, text-mining-based analysis of 1,047 peer-reviewed articles. Unlike prior works that primarily summarize individual security risks, our research systematically maps the evolution of security discussions, major research clusters, and interconnections between emerging issues using scientometric techniques(line 193 ~ 217). We appreciate your guidance in refining our manuscript and believe this clarification strengthens the positioning of our study. Please let us know if you have any further suggestions.
|
Comments 3: It is said, that timeframe covers period to June 2024. It should be expanded to the end of 2024 at least. - Not addressed. Provided arguments are not strong, since indexing typically takes up to two months. June 2024 is too far and review in the current form is outdated. |
Response 3: Thank you for your insightful feedback regarding the timeframe of our study. We fully acknowledge the importance of ensuring that our review remains up-to-date and relevant. However, expanding the timeframe to December 2024 would fundamentally alter the dataset and, consequently, the entire analysis. Since our study applies systematic text mining and topic modeling techniques, incorporating additional data would not merely update the review but would necessitate a complete re-execution of the analysis, potentially leading to different findings and conclusions. In essence, this would result in a new study rather than a refinement of the current one. Furthermore, while indexing delays can be a factor, the key security-related trends in generative AI research are already well-represented within our current dataset (up to June 2024). We have ensured that our review captures the most recent developments available at the time of writing and have highlighted the need for continuous updates in future research. |
Comments 4: In Chapter 2.2.4 when introducing LDA, please provide more detailed description of the method steps. - Mentioned corrections are not seen on the referenced lines. |
Response 4: Thank you for your valuable feedback regarding the LDA methodology in Section 2.2.4. To address your request for a more detailed explanation of the LDA implementation process, we have now included a structured description outlining the key steps involved in text preprocessing, vectorization, model training, hyperparameter selection, and topic interpretation. These additions clarify how LDA was applied in our study to extract meaningful topics. The revised content has been added after Line 442 ~ 460 in the manuscript. We appreciate your insightful comments and look forward to your feedback. |
Comments 5: Variable description under Eq.1 is missing. - Explanations of variables should be given below the equation, doesn't matter if they were explained before. |
Response 5: Thank you for your feedback regarding the missing variable descriptions in Equation (1). To address this, we have added explicit explanations of all variables directly below the equation, ensuring that their meanings are clearly defined. This includes definitions for the coherence score, the number of topics, the number of words per topic, the cosine similarity function, the context vector, and the reference vector (line 416 ~ 434). Additionally, we have ensured that the following section on LDA implementation naturally follows this explanation without redundancy. The revised content has been incorporated into the manuscript accordingly. |
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have successfully solved all my comments snd concerns.
Author Response
Comments 1: The authors have successfully solved all my comments and concerns. |
Response 1: Thank you for your positive feedback. We appreciate your careful review and constructive comments, which have helped us improve the quality of our manuscript. We are glad that all your concerns have been addressed, and we sincerely appreciate your time and effort in evaluating our work. |
Author Response File: Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe revised paper includes significant extensions and corrections related to my comments related to the first version. I am satisfied with this improvement and responses to my comments. However, I suggest adding a short paragraph at the end of section 1 specifying the outline of the paper (related to subsequent sections). Line 412, probably it is reasonable to write “equation (1) from [44]” to be more precise. Line 424 (equation) - it seems that parameter s and vectors in the brackets of cos(…) are not explained in the paper, some comment is needed.
Author Response
Comments 1: I suggest adding a short paragraph at the end of section 1 specifying the outline of the paper (related to subsequent sections). |
Response 1: Thank you for your valuable feedback and constructive suggestions. We appreciate your positive comments on the improvements made in the revised version of our manuscript. We have added a short paragraph at the end of Section 1 to provide an overview of the paper’s structure, outlining the contents of each subsequent section(line 94 ~ 104). These modifications have been incorporated into the revised manuscript. We appreciate your insightful suggestions and look forward to any further feedback. |
Comments 2: Line 424 (equation) - it seems that parameter s and vectors in the brackets of cos(…) are not explained in the paper, some comment is needed. |
Response 2: Thank you for your valuable feedback. To address your concern, we have added explicit explanations for the missing parameters in Equation (1), including the weighting factor s and the vectors used in the cosine similarity function. These descriptions are now provided directly below the equation for clarity(line 416 ~ 436). We appreciate your careful review and look forward to any further feedback. |
Author Response File: Author Response.pdf